Read forecast and observations and verify. — ens_read_and

This function will likely be replaced in future harp versions since it is rather cumbersome. Some thought still needs to go into finding a more usable API.

This is a wrapper for the verification process. Forecasts and observations are read in, filtered down to common cases, errors checked, and a full verification is done for all scores. To minimise memory usage, the verification can be done for one lead time at time. It would also be possible to parallelise the process using for example mclapply, or future_map.

Usage

ens_read_and_verify(
  start_date,
  end_date,
  parameter,
  fcst_model,
  fcst_path,
  obs_path,
  lead_time = seq(0, 48, 3),
  num_iterations = length(lead_time),
  verify_members = TRUE,
  thresholds = NULL,
  members = NULL,
  vertical_coordinate = c(NA_character_, "pressure", "model", "height"),
  fctable_file_template = "fctable",
  obsfile_template = "obstable",
  groupings = "lead_time",
  by = "6h",
  lags = "0s",
  merge_lags_on_read = TRUE,
  lag_fcst_models = NULL,
  parent_cycles = NULL,
  lag_direction = 1,
  fcst_shifts = NULL,
  keep_unshifted = FALSE,
  drop_neg_leadtimes = TRUE,
  climatology = "sample",
  stations = NULL,
  scale_fcst = NULL,
  scale_obs = NULL,
  spread_drop_member = NULL,
  jitter_fcst = NULL,
  common_cases_only = TRUE,
  common_cases_xtra_cols = NULL,
  check_obs_fcst = TRUE,
  gross_error_check = TRUE,
  min_allowed = NULL,
  max_allowed = NULL,
  num_sd_allowed = NULL,
  show_progress = FALSE,
  verif_path = NULL
)

Arguments

start_date: Start date to for the verification. Should be numeric or character. YYYYMMDD(HH)(mm).
end_date: End date for the verification. Should be numeric or character.
parameter: The parameter to verify.
fcst_model: The forecast model(s) to verify. Can be a single string or a character vector of model names.
fcst_path: The path to the forecast FCTABLE files.
obs_path: The path to the observation OBSTABLE files.
lead_time: The lead times to verify.
num_iterations: The number of iterations per verification calculation. The default is to do the same number of iterations as there are lead times. If a small number of iterations is set, it may be useful to set show_progress = TRUE. The higher the number of iterations, the smaller the amount of data that is held in memory at any one time.
verify_members: Whether to verify the individual members of the ensemble. Even if thresholds are supplied, only summary scores are computed. If you wish to compute categorical scores, the separate det_verify function must be used.
thresholds: The thresholds to compute categorical scores for.
members: The members to retrieve if reading an EPS forecast. To select the same members for all forecast models, this should be a numeric vector. For specific members from specific models a named list with each element having the name of the forecast model and containing a a numeric vector. e.g.
members = list(eps_model1 = seq(0, 3), eps_model2 = c(2, 3)).
For multi model ensembles, each element of this named list should contain another named list with sub model name followed by the desired members, e.g.
members = list(eps_model1 = list(sub_model1 = seq(0, 3), sub_model2 = c(2, 3)))
vertical_coordinate: The vertical co-ordinate.
fctable_file_template: The template for the file names of the files to be read from. This would normally be one of the "fctable_*" templates that can be seen in show_file_templates. Can be a single string, a character vector or list of the same length as fcst_model. If not named, the order of templates is assumed to be the same as in fcst_model. If named, the names must match the entries in fcst_model.
obsfile_template: The template for OBSTABLE files - the default is "obstable", which is OBSTABLE_{YYYY}.sqlite.
groupings: The groups to verify for. The default is "leadtime". Another common grouping might be groupings = c("leadtime", "fcst_cycle").
by: The frequency of forecast cycles to verify.
lags: For lagged forecasts, these are the lags that would be passed to read_point_forecast().
merge_lags_on_read: Logical. Whether to merge lagged ensemble members into the ensemble. This is the default behaviour. If FALSE, lag_forecast will be used to do the lagging.
lag_fcst_models: If merge_lags_on_read = FALSE, the names of the fcst_models to which lags should be applied.
parent_cycles: If merge_lags_on_read = FALSE, the parent cycles of the lagged forecasts.
lag_direction: If merge_lags_on_read = FALSE, The direction of the lagging. 1 Lags backwards in time from the parent cycles on -1 lags forwards in time.
fcst_shifts, keep_unshifted: See shift_forecast.
drop_neg_leadtimes: Logical. Whether to drop negative lead times that may arise after shifting.
climatology: The climatology to use for the Brier Skill Score. Can be "sample" for the sample climatology (the default), a named list with elements eps_model and member to use a member of an eps model in the harp_fcst object for the climatology, or a data frame with columns for threshold and climatology and also optionally leadtime.
stations: The stations to verify for. The default is to use all stations from station_list that are common to all fcst_model domains.
scale_fcst: A named list of arguments to scale_point_forecast.
scale_obs: A names list of arguments to scale_point_obs.
spread_drop_member: Which members to drop for the calculation of the ensemble variance and standard deviation. For harp_fcst objects, this can be a numeric scalar - in which case it is recycled for all forecast models; a list or numeric vector of the same length as the harp_fcst object, or a named list with the names corresponding to names in the harp_fcst object.
jitter_fcst: A function to perturb the forecast values by. This is used to account for observation error in the rank histogram. For other statistics it is likely to make little difference since it is expected that the observations will have a mean error of zero.
common_cases_only: Logical. Whether to select only the common cases before computing verification scores. The default is TRUE.
common_cases_xtra_cols: Extra columns to use in the call to common_cases
check_obs_fcst: Logical. Whether to check for errors in observations by comparing with forecast values.
gross_error_check: Logical of whether to perform a gross error check.
min_allowed: The minimum value of observation to allow in the gross error check. If set to NULL the default value for the parameter is used.
max_allowed: The maximum value of observation to allow in the gross error check. If set to NULL the default value for the parameter is used.
num_sd_allowed: The number of standard deviations of the forecast that the obseravtions should be within. Set to NULL for automotic value depeninding on parameter.
show_progress: Logical - whether to show a progress bar. Defaults to FALSE.
verif_path: If set, verification files will be saved to this path.

Value

A list containing two data frames: ens_summary_scores and ens_threshold_scores.