Bootstrap a verification function — bootstrap

bootstrap_verify is used to compute verification scores with confidence intervals. if more than one fcst_model exists in the input harp_list object, the statistical significance of the differences in verification scores between fcst_models is computed. The statistical testing is done using the bootstrap method whereby the scores are computed repeatedly for random samples of the input data.

Usage

bootstrap_verify(
  .fcst,
  verif_func,
  obs_col,
  n,
  groupings = "lead_time",
  pool_by = NULL,
  conf = 0.95,
  min_cases = 4,
  perfect_scores = perfect_score(),
  parallel = FALSE,
  num_cores = NULL,
  show_progress = TRUE,
  ...
)

Arguments

.fcst: A harp_list object with a column for observations.
verif_func: The harpPoint verification function to bootstrap.
obs_col: The observations column in the harp_list object. Can be the column name, quoted, or unquoted. If a variable it should be embraced - i.e. wrapped in {{}}
n: The number of bootstrap replicates.
groupings: The groups for which to compute the scores. See group_by for more information of how grouping works.
pool_by: For a block bootstrap, the quoted column name to use to pool the data into blocks. For overlapping blocks this should be a data frame with a column that is common to the harp_list object input and a column named "pool" for which pool the data belong to. See Details.
conf: The confidence interval to compute.
min_cases: The minimum number of cases required in a group. For block bootstrapping this is the minimum number of blocks.
perfect_scores: The values for that each score has to be a perfect score.
parallel: Set to TRUE to use parallel processing for the bootstrapping. Requires the parallel package.
num_cores: If parallel = TRUE, the number of cores to use in the parallel processing. If NULL, the number of cores detected by parallel::detectCores() are used.
show_progress: Logical. Set to TRUE to show a progress bar. This feature is not available if parallel = TRUE
...: Other arguments to verif_func

Value

A harp_point_verif object with extra columns for upper and lower confidence bounds of scores and the percent of replicates that are "better" where there are more than one fcst_models in the input harp_list_object

Details

For data that are auto-correlated a block bootstrap may be used, whereby data are pooled into groups in which the serial dependencies are maintained. Rather than sampling individual data points randomly, pools of data points are sampled randomly. The pools are taken from the column passed to the pool_by argument. To use an overlapping block bootstrap a data frame should be passed to pool_by, with one column that is common to the harp_list object input and a column named "pool" that labels what pool a row is in. This ensures that the correct number of overlapping pools are used in each bootstrap replicate. make_bootstrap_pools can be used to get a data frame of overlapping pools.

Bootstrapping can be quite slow since many replicates are computed. In order to speed the process up, bootstrap_verify also works in parallel whereby replicates are computed by individual cores in parallel rather than in serial. This can be achieved by setting parallel = TRUE. The default behaviour is to use all cores, but the number of cores can be set by the num_cores argument.