This function will likely be replaced in future harp versions since it is rather cumbersome. Some thought still needs to go into finding a more usable API.
This is a wrapper for the verification process. Forecasts and observations are read in, filtered down to common cases, errors checked, and a full verification is done for all scores. To minimise memory usage, the verification can be done for one lead time at time. It would also be possible to parallelise the process using for example mclapply, or future_map.
Usage
ens_read_and_verify(
start_date,
end_date,
parameter,
fcst_model,
fcst_path,
obs_path,
lead_time = seq(0, 48, 3),
num_iterations = length(lead_time),
verify_members = TRUE,
thresholds = NULL,
members = NULL,
vertical_coordinate = c(NA_character_, "pressure", "model", "height"),
fctable_file_template = "fctable",
obsfile_template = "obstable",
groupings = "lead_time",
by = "6h",
lags = "0s",
merge_lags_on_read = TRUE,
lag_fcst_models = NULL,
parent_cycles = NULL,
lag_direction = 1,
fcst_shifts = NULL,
keep_unshifted = FALSE,
drop_neg_leadtimes = TRUE,
climatology = "sample",
stations = NULL,
scale_fcst = NULL,
scale_obs = NULL,
spread_drop_member = NULL,
jitter_fcst = NULL,
common_cases_only = TRUE,
common_cases_xtra_cols = NULL,
check_obs_fcst = TRUE,
gross_error_check = TRUE,
min_allowed = NULL,
max_allowed = NULL,
num_sd_allowed = NULL,
show_progress = FALSE,
verif_path = NULL
)
Arguments
- start_date
Start date to for the verification. Should be numeric or character. YYYYMMDD(HH)(mm).
- end_date
End date for the verification. Should be numeric or character.
- parameter
The parameter to verify.
- fcst_model
The forecast model(s) to verify. Can be a single string or a character vector of model names.
- fcst_path
The path to the forecast FCTABLE files.
- obs_path
The path to the observation OBSTABLE files.
- lead_time
The lead times to verify.
- num_iterations
The number of iterations per verification calculation. The default is to do the same number of iterations as there are lead times. If a small number of iterations is set, it may be useful to set
show_progress = TRUE
. The higher the number of iterations, the smaller the amount of data that is held in memory at any one time.- verify_members
Whether to verify the individual members of the ensemble. Even if thresholds are supplied, only summary scores are computed. If you wish to compute categorical scores, the separate det_verify function must be used.
- thresholds
The thresholds to compute categorical scores for.
- members
The members to retrieve if reading an EPS forecast. To select the same members for all forecast models, this should be a numeric vector. For specific members from specific models a named list with each element having the name of the forecast model and containing a a numeric vector. e.g.
members = list(eps_model1 = seq(0, 3), eps_model2 = c(2, 3))
.
For multi model ensembles, each element of this named list should contain another named list with sub model name followed by the desired members, e.g.members = list(eps_model1 = list(sub_model1 = seq(0, 3), sub_model2 = c(2, 3)))
- vertical_coordinate
The vertical co-ordinate.
- fctable_file_template
The template for the file names of the files to be read from. This would normally be one of the "fctable_*" templates that can be seen in show_file_templates. Can be a single string, a character vector or list of the same length as
fcst_model
. If not named, the order of templates is assumed to be the same as infcst_model
. If named, the names must match the entries infcst_model
.- obsfile_template
The template for OBSTABLE files - the default is "obstable", which is
OBSTABLE_{YYYY}.sqlite
.- groupings
The groups to verify for. The default is "leadtime". Another common grouping might be
groupings = c("leadtime", "fcst_cycle")
.- by
The frequency of forecast cycles to verify.
- lags
For lagged forecasts, these are the lags that would be passed to
read_point_forecast()
.- merge_lags_on_read
Logical. Whether to merge lagged ensemble members into the ensemble. This is the default behaviour. If FALSE, lag_forecast will be used to do the lagging.
- lag_fcst_models
If
merge_lags_on_read = FALSE
, the names of the fcst_models to which lags should be applied.- parent_cycles
If
merge_lags_on_read = FALSE
, the parent cycles of the lagged forecasts.- lag_direction
If
merge_lags_on_read = FALSE
, The direction of the lagging. 1 Lags backwards in time from the parent cycles on -1 lags forwards in time.- fcst_shifts, keep_unshifted
See shift_forecast.
- drop_neg_leadtimes
Logical. Whether to drop negative lead times that may arise after shifting.
- climatology
The climatology to use for the Brier Skill Score. Can be "sample" for the sample climatology (the default), a named list with elements eps_model and member to use a member of an eps model in the harp_fcst object for the climatology, or a data frame with columns for threshold and climatology and also optionally leadtime.
- stations
The stations to verify for. The default is to use all stations from station_list that are common to all
fcst_model
domains.- scale_fcst
A named list of arguments to scale_point_forecast.
- scale_obs
A names list of arguments to scale_point_obs.
- spread_drop_member
Which members to drop for the calculation of the ensemble variance and standard deviation. For harp_fcst objects, this can be a numeric scalar - in which case it is recycled for all forecast models; a list or numeric vector of the same length as the harp_fcst object, or a named list with the names corresponding to names in the harp_fcst object.
- jitter_fcst
A function to perturb the forecast values by. This is used to account for observation error in the rank histogram. For other statistics it is likely to make little difference since it is expected that the observations will have a mean error of zero.
- common_cases_only
Logical. Whether to select only the common cases before computing verification scores. The default is TRUE.
- common_cases_xtra_cols
Extra columns to use in the call to
common_cases
- check_obs_fcst
Logical. Whether to check for errors in observations by comparing with forecast values.
- gross_error_check
Logical of whether to perform a gross error check.
- min_allowed
The minimum value of observation to allow in the gross error check. If set to NULL the default value for the parameter is used.
- max_allowed
The maximum value of observation to allow in the gross error check. If set to NULL the default value for the parameter is used.
- num_sd_allowed
The number of standard deviations of the forecast that the obseravtions should be within. Set to NULL for automotic value depeninding on parameter.
- show_progress
Logical - whether to show a progress bar. Defaults to FALSE.
- verif_path
If set, verification files will be saved to this path.