Skip to contents

bootstrap_verify is used to compute verification scores with confidence intervals. if more than one fcst_model exists in the input harp_list object, the statistical significance of the differences in verification scores between fcst_models is computed. The statistical testing is done using the bootstrap method whereby the scores are computed repeatedly for random samples of the input data.

Usage

bootstrap_verify(
  .fcst,
  verif_func,
  obs_col,
  n,
  groupings = "lead_time",
  pool_by = NULL,
  conf = 0.95,
  min_cases = 4,
  perfect_scores = perfect_score(),
  parallel = FALSE,
  num_cores = NULL,
  show_progress = TRUE,
  ...
)

Arguments

.fcst

A harp_list object with a column for observations.

verif_func

The harpPoint verification function to bootstrap.

obs_col

The observations column in the harp_list object. Can be the column name, quoted, or unquoted. If a variable it should be embraced - i.e. wrapped in {{}}

n

The number of bootstrap replicates.

groupings

The groups for which to compute the scores. See group_by for more information of how grouping works.

pool_by

For a block bootstrap, the quoted column name to use to pool the data into blocks. For overlapping blocks this should be a data frame with a column that is common to the harp_list object input and a column named "pool" for which pool the data belong to. See Details.

conf

The confidence interval to compute.

min_cases

The minimum number of cases required in a group. For block bootstrapping this is the minimum number of blocks.

perfect_scores

The values for that each score has to be a perfect score.

parallel

Set to TRUE to use parallel processing for the bootstrapping. Requires the parallel package.

num_cores

If parallel = TRUE, the number of cores to use in the parallel processing. If NULL, the number of cores detected by parallel::detectCores() are used.

show_progress

Logical. Set to TRUE to show a progress bar. This feature is not available if parallel = TRUE

...

Other arguments to verif_func

Value

A harp_point_verif object with extra columns for upper and lower confidence bounds of scores and the percent of replicates that are "better" where there are more than one fcst_models in the input harp_list_object

Details

For data that are auto-correlated a block bootstrap may be used, whereby data are pooled into groups in which the serial dependencies are maintained. Rather than sampling individual data points randomly, pools of data points are sampled randomly. The pools are taken from the column passed to the pool_by argument. To use an overlapping block bootstrap a data frame should be passed to pool_by, with one column that is common to the harp_list object input and a column named "pool" that labels what pool a row is in. This ensures that the correct number of overlapping pools are used in each bootstrap replicate. make_bootstrap_pools can be used to get a data frame of overlapping pools.

Bootstrapping can be quite slow since many replicates are computed. In order to speed the process up, bootstrap_verify also works in parallel whereby replicates are computed by individual cores in parallel rather than in serial. This can be achieved by setting parallel = TRUE. The default behaviour is to use all cores, but the number of cores can be set by the num_cores argument.