bootstrap_verify
is used to compute verification scores with
confidence intervals. if more than one fcst_model
exists in the input
harp_list
object, the statistical significance of the differences in
verification scores between fcst_model
s is computed. The statistical
testing is done using the bootstrap method whereby the scores are computed
repeatedly for random samples of the input data.
Usage
bootstrap_verify(
.fcst,
verif_func,
obs_col,
n,
groupings = "lead_time",
pool_by = NULL,
conf = 0.95,
min_cases = 4,
perfect_scores = perfect_score(),
parallel = FALSE,
num_cores = NULL,
show_progress = TRUE,
...
)
Arguments
- .fcst
A
harp_list
object with a column for observations.- verif_func
The
harpPoint
verification function to bootstrap.- obs_col
The observations column in the
harp_list
object. Can be the column name, quoted, or unquoted. If a variable it should be embraced - i.e. wrapped in{{}}
- n
The number of bootstrap replicates.
- groupings
The groups for which to compute the scores. See group_by for more information of how grouping works.
- pool_by
For a block bootstrap, the quoted column name to use to pool the data into blocks. For overlapping blocks this should be a data frame with a column that is common to the
harp_list
object input and a column named "pool" for which pool the data belong to. See Details.- conf
The confidence interval to compute.
- min_cases
The minimum number of cases required in a group. For block bootstrapping this is the minimum number of blocks.
- perfect_scores
The values for that each score has to be a perfect score.
- parallel
Set to TRUE to use parallel processing for the bootstrapping. Requires the parallel package.
- num_cores
If parallel = TRUE, the number of cores to use in the parallel processing. If NULL, the number of cores detected by
parallel::detectCores()
are used.- show_progress
Logical. Set to TRUE to show a progress bar. This feature is not available if
parallel = TRUE
- ...
Other arguments to
verif_func
Value
A harp_point_verif object with extra columns for upper and lower
confidence bounds of scores and the percent of replicates that are "better"
where there are more than one fcst_model
s in the input
harp_list_object
Details
For data that are auto-correlated a block bootstrap may be used, whereby data
are pooled into groups in which the serial dependencies are maintained.
Rather than sampling individual data points randomly, pools of data points
are sampled randomly. The pools are taken from the column passed to the
pool_by
argument. To use an overlapping block bootstrap a data frame
should be passed to pool_by
, with one column that is common to the
harp_list
object input and a column named "pool" that labels what pool
a row is in. This ensures that the correct number of overlapping pools are
used in each bootstrap replicate. make_bootstrap_pools can be used to
get a data frame of overlapping pools.
Bootstrapping can be quite slow since many replicates are computed. In order
to speed the process up, bootstrap_verify
also works in parallel
whereby replicates are computed by individual cores in parallel rather than
in serial. This can be achieved by setting parallel = TRUE
. The
default behaviour is to use all cores, but the number of cores can be set by
the num_cores
argument.