Make pools for block bootstrapping — make_bootstrap

For doing a block bootstrap using bootstrap_verify, the blocks can be passed as a data frame with a "pool" column telling bootstrap_verify how to pool the data into blocks. make_bootstrap_pools is a function to make such a data frame.

Usage

make_bootstrap_pools(.fcst, pool_col, pool_length, overlap = FALSE)

Arguments

.fcst: A harp_fcst object
pool_col: The column used to define the pools. Can be the column name, quoted, or unquoted. If a variable it should be embraced - i.e. wrapped in {{}}
pool_length: The length of a pool. Numeric or a character with a unit qualifier if pool_col is in date-time format. The unit qualifier can be : "s" = seconds, "m" = minutes, "h" = hours, "d" = days.
overlap: Logical. Whether the pools should overlap.

Value

A data frame with columns from pool_col and "pool".

Details

Typically block bootstrapping would be used if there are serial auto-correlations in the data. If for example auto-correlations are suspected between forecasts, pools could be defined from the fcdate column to create blocks of data where those auto-correlations are maintained.

Pools may be set to overlap, whereby a new pool is created beginning at each new value in pool_col. The length of a pool should be defined in the units used in pool_col - if pool_col is a date-time column, then pool_length is assumed to be in hours, though the units can be set by adding a qualifier letter: "s" = seconds, "m" = minutes, "h" = hours, "d" = days.

Examples

make_bootstrap_pools(ens_point_df, lead_time, 2)
#> # A tibble: 24 × 2
#>    lead_time  pool
#>        <dbl> <dbl>
#>  1         0     1
#>  2         1     1
#>  3         2     2
#>  4         3     2
#>  5         4     3
#>  6         5     3
#>  7         6     4
#>  8         7     4
#>  9         8     5
#> 10         9     5
#> # ℹ 14 more rows
make_bootstrap_pools(ens_point_df, lead_time, 2, overlap = TRUE)
#> # A tibble: 46 × 2
#>    lead_time  pool
#>        <dbl> <int>
#>  1         0     1
#>  2         1     1
#>  3         1     2
#>  4         2     2
#>  5         2     3
#>  6         3     3
#>  7         3     4
#>  8         4     4
#>  9         4     5
#> 10         5     5
#> # ℹ 36 more rows

# pool_col as a variable
my_col <- "lead_time"
make_bootstrap_pools(ens_point_df, {{my_col}}, 2)
#> # A tibble: 24 × 2
#>    lead_time  pool
#>        <dbl> <dbl>
#>  1         0     1
#>  2         1     1
#>  3         2     2
#>  4         3     2
#>  5         4     3
#>  6         5     3
#>  7         6     4
#>  8         7     4
#>  9         8     5
#> 10         9     5
#> # ℹ 14 more rows