Skip to contents

[Questioning]

This function will likely be replaced in future harp versions since it is rather cumbersome. Some thought still needs to go into finding a more usable API.

This is a wrapper for the verification process. Forecasts and observations are read in, filtered down to common cases, errors checked, and a full verification is done for all scores. To minimise memory usage, the verification can be done for one lead time at time. It would also be possible to parallelise the process using for example mclapply, or future_map.

Usage

ens_read_and_verify(
  start_date,
  end_date,
  parameter,
  fcst_model,
  fcst_path,
  obs_path,
  lead_time = seq(0, 48, 3),
  num_iterations = length(lead_time),
  verify_members = TRUE,
  thresholds = NULL,
  members = NULL,
  vertical_coordinate = c(NA_character_, "pressure", "model", "height"),
  fctable_file_template = "fctable",
  obsfile_template = "obstable",
  groupings = "lead_time",
  by = "6h",
  lags = "0s",
  merge_lags_on_read = TRUE,
  lag_fcst_models = NULL,
  parent_cycles = NULL,
  lag_direction = 1,
  fcst_shifts = NULL,
  keep_unshifted = FALSE,
  drop_neg_leadtimes = TRUE,
  climatology = "sample",
  stations = NULL,
  scale_fcst = NULL,
  scale_obs = NULL,
  spread_drop_member = NULL,
  jitter_fcst = NULL,
  common_cases_only = TRUE,
  common_cases_xtra_cols = NULL,
  check_obs_fcst = TRUE,
  gross_error_check = TRUE,
  min_allowed = NULL,
  max_allowed = NULL,
  num_sd_allowed = NULL,
  show_progress = FALSE,
  verif_path = NULL
)

Arguments

start_date

Start date to for the verification. Should be numeric or character. YYYYMMDD(HH)(mm).

end_date

End date for the verification. Should be numeric or character.

parameter

The parameter to verify.

fcst_model

The forecast model(s) to verify. Can be a single string or a character vector of model names.

fcst_path

The path to the forecast FCTABLE files.

obs_path

The path to the observation OBSTABLE files.

lead_time

The lead times to verify.

num_iterations

The number of iterations per verification calculation. The default is to do the same number of iterations as there are lead times. If a small number of iterations is set, it may be useful to set show_progress = TRUE. The higher the number of iterations, the smaller the amount of data that is held in memory at any one time.

verify_members

Whether to verify the individual members of the ensemble. Even if thresholds are supplied, only summary scores are computed. If you wish to compute categorical scores, the separate det_verify function must be used.

thresholds

The thresholds to compute categorical scores for.

members

The members to retrieve if reading an EPS forecast. To select the same members for all forecast models, this should be a numeric vector. For specific members from specific models a named list with each element having the name of the forecast model and containing a a numeric vector. e.g.
members = list(eps_model1 = seq(0, 3), eps_model2 = c(2, 3)).
For multi model ensembles, each element of this named list should contain another named list with sub model name followed by the desired members, e.g.
members = list(eps_model1 = list(sub_model1 = seq(0, 3), sub_model2 = c(2, 3)))

vertical_coordinate

The vertical co-ordinate.

fctable_file_template

The template for the file names of the files to be read from. This would normally be one of the "fctable_*" templates that can be seen in show_file_templates. Can be a single string, a character vector or list of the same length as fcst_model. If not named, the order of templates is assumed to be the same as in fcst_model. If named, the names must match the entries in fcst_model.

obsfile_template

The template for OBSTABLE files - the default is "obstable", which is OBSTABLE_{YYYY}.sqlite.

groupings

The groups to verify for. The default is "leadtime". Another common grouping might be groupings = c("leadtime", "fcst_cycle").

by

The frequency of forecast cycles to verify.

lags

For lagged forecasts, these are the lags that would be passed to read_point_forecast().

merge_lags_on_read

Logical. Whether to merge lagged ensemble members into the ensemble. This is the default behaviour. If FALSE, lag_forecast will be used to do the lagging.

lag_fcst_models

If merge_lags_on_read = FALSE, the names of the fcst_models to which lags should be applied.

parent_cycles

If merge_lags_on_read = FALSE, the parent cycles of the lagged forecasts.

lag_direction

If merge_lags_on_read = FALSE, The direction of the lagging. 1 Lags backwards in time from the parent cycles on -1 lags forwards in time.

fcst_shifts, keep_unshifted

See shift_forecast.

drop_neg_leadtimes

Logical. Whether to drop negative lead times that may arise after shifting.

climatology

The climatology to use for the Brier Skill Score. Can be "sample" for the sample climatology (the default), a named list with elements eps_model and member to use a member of an eps model in the harp_fcst object for the climatology, or a data frame with columns for threshold and climatology and also optionally leadtime.

stations

The stations to verify for. The default is to use all stations from station_list that are common to all fcst_model domains.

scale_fcst

A named list of arguments to scale_point_forecast.

scale_obs

A names list of arguments to scale_point_obs.

spread_drop_member

Which members to drop for the calculation of the ensemble variance and standard deviation. For harp_fcst objects, this can be a numeric scalar - in which case it is recycled for all forecast models; a list or numeric vector of the same length as the harp_fcst object, or a named list with the names corresponding to names in the harp_fcst object.

jitter_fcst

A function to perturb the forecast values by. This is used to account for observation error in the rank histogram. For other statistics it is likely to make little difference since it is expected that the observations will have a mean error of zero.

common_cases_only

Logical. Whether to select only the common cases before computing verification scores. The default is TRUE.

common_cases_xtra_cols

Extra columns to use in the call to common_cases

check_obs_fcst

Logical. Whether to check for errors in observations by comparing with forecast values.

gross_error_check

Logical of whether to perform a gross error check.

min_allowed

The minimum value of observation to allow in the gross error check. If set to NULL the default value for the parameter is used.

max_allowed

The maximum value of observation to allow in the gross error check. If set to NULL the default value for the parameter is used.

num_sd_allowed

The number of standard deviations of the forecast that the obseravtions should be within. Set to NULL for automotic value depeninding on parameter.

show_progress

Logical - whether to show a progress bar. Defaults to FALSE.

verif_path

If set, verification files will be saved to this path.

Value

A list containing two data frames: ens_summary_scores and ens_threshold_scores.