library(harp)
library(here)
library(dplyr)
library(forcats)
Point Verifcation Workflow
Workflow Steps
When doing a point verification there are a number of steps to the workflow. Some of these steps you have to do every time, and others might depend on the forecast parameter being verified, the scores you want to compute, or any conditions that you want to apply to the verification. The compulsory steps can be described as:
- Read forecast
- Read observations
- Join
- Verify
- (Save / Plot)
In this tutorial we will go through each of these steps in turn and then introduce some optional steps, increasing the complexity as we go. It is assumed that SQLite files for both forecasts and observations have been prepared.
Basic deterministic verification
Here we will demonstrate the workflow for a simple deterministic verification of 2m temperature. The forecasts come from the AROME-Arctic model that is run operationally by MET Norway, and we will do the verification for August 2023.
Read forecast
As always, the first thing we need to do is to attach the packages that we are going to need. There may be some unfamiliar packages here, but they will be explained as we begin to use functions from them.
All of our forecasts and observations use the same root directories so we’ll set them here
<- here("data", "FCTABLE")
fcst_dir <- here("data", "OBSTABLE") obs_dir
Forecasts are read in with read_point_forecast()
. We will read in the 00:00 UTC forecasts for lead times 0 - 24 hours every 3 hours.
<- read_point_forecast(
fcst dttm = seq_dttm(2023080100, 2023083100, "24h"),
fcst_model = "arome_arctic",
fcst_type = "det",
parameter = "T2m",
file_path = fcst_dir
)
Read observations
Observations are read in with read_point_obs()
. Observations files often contain more times and locations than we have for the forecasts. Therefore, we can tell the the function which times and locations to read with the help of unique_valid_dttm()
and unique_station()
.
<- read_point_obs(
obs dttm = unique_valid_dttm(fcst),
parameter = "T2m",
stations = unique_stations(fcst),
obs_path = obs_dir
)
Join
Now that we have the forecasts and observations, we need to match the forecasts and observations to the same date-times and locations. We do this by joining the forecast and observations to each other using join_to_fcst()
. Basically this is doing an inner join between the forecast and observations data frames with an extra check to make sure the forecast and observations data have the same units.
<- join_to_fcst(fcst, obs) fcst
Verify
We are now ready to verify. For deterministic forecasts this is done with det_verify()
. By default the verification is stratified by lead time. All we need to tell the function is which column contains the observations. In this case, that would be “T2m”.
det_verify(fcst, T2m)
::det_summary_scores:: # A tibble: 9 × 9
fcst_model lead_time num_cases num_stations bias rmse mae stde hexbin
<chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <list>
1 arome_arc… 0 5611 191 0.501 1.50 1.06 1.42 <tibble>
2 arome_arc… 3 5669 192 0.427 1.50 1.05 1.44 <tibble>
3 arome_arc… 6 5703 194 -0.0388 1.16 0.848 1.16 <tibble>
4 arome_arc… 9 5723 193 -0.222 1.38 1.05 1.36 <tibble>
5 arome_arc… 12 5715 194 -0.268 1.47 1.11 1.44 <tibble>
6 arome_arc… 15 5745 193 -0.265 1.42 1.06 1.40 <tibble>
7 arome_arc… 18 5664 194 0.135 1.28 0.934 1.27 <tibble>
8 arome_arc… 21 5692 192 0.572 1.75 1.24 1.66 <tibble>
9 arome_arc… 24 5615 191 0.695 2.00 1.41 1.88 <tibble>
--harp verification for T2m--
# for forecasts from 00:00 UTC 01 aug. 2023 to 00:00 UTC 31 aug. 2023
# using 194 observation stations
# for verification groups:
-> lead_time
We can also compute categorical scores by adding some thresholds. This will compute scores for >= threshold categories.
det_verify(fcst, T2m, thresholds = seq(280, 290, 2.5))
::det_summary_scores:: # A tibble: 9 × 9
fcst_model lead_time num_cases num_stations bias rmse mae stde hexbin
<chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <list>
1 arome_arc… 0 5611 191 0.501 1.50 1.06 1.42 <tibble>
2 arome_arc… 3 5669 192 0.427 1.50 1.05 1.44 <tibble>
3 arome_arc… 6 5703 194 -0.0388 1.16 0.848 1.16 <tibble>
4 arome_arc… 9 5723 193 -0.222 1.38 1.05 1.36 <tibble>
5 arome_arc… 12 5715 194 -0.268 1.47 1.11 1.44 <tibble>
6 arome_arc… 15 5745 193 -0.265 1.42 1.06 1.40 <tibble>
7 arome_arc… 18 5664 194 0.135 1.28 0.934 1.27 <tibble>
8 arome_arc… 21 5692 192 0.572 1.75 1.24 1.66 <tibble>
9 arome_arc… 24 5615 191 0.695 2.00 1.41 1.88 <tibble>
::det_threshold_scores:: # A tibble: 45 × 41
fcst_model lead_time threshold num_stations num_cases_for_threshold_total
<chr> <int> <dbl> <int> <int>
1 arome_arctic 0 280 191 5335
2 arome_arctic 0 282. 191 4646
3 arome_arctic 0 285 191 3216
4 arome_arctic 0 288. 191 1667
5 arome_arctic 0 290 191 561
6 arome_arctic 3 280 192 5314
7 arome_arctic 3 282. 192 4549
8 arome_arctic 3 285 192 3125
9 arome_arctic 3 288. 192 1565
10 arome_arctic 3 290 192 494
# ℹ 35 more rows
# ℹ 36 more variables: num_cases_for_threshold_observed <dbl>,
# num_cases_for_threshold_forecast <dbl>, cont_tab <list>,
# threat_score <dbl>, hit_rate <dbl>, miss_rate <dbl>,
# false_alarm_rate <dbl>, false_alarm_ratio <dbl>, heidke_skill_score <dbl>,
# pierce_skill_score <dbl>, kuiper_skill_score <dbl>, percent_correct <dbl>,
# frequency_bias <dbl>, equitable_threat_score <dbl>, odds_ratio <dbl>, …
--harp verification for T2m--
# for forecasts from 00:00 UTC 01 aug. 2023 to 00:00 UTC 31 aug. 2023
# using 194 observation stations
# for verification groups:
-> lead_time & threshold
Save / Plot
Once the verification is done we can save the data using save_point_verif()
and plot the data using plot_point_verif()
. For plotting we give the function the verification data and the score we want to plot. The scores are basically column names in any of the verification data’s data frames.
First we need to write the verification to a variable, and then save.
<- det_verify(fcst, T2m, thresholds = seq(280, 290, 2.5))
verif
save_point_verif(verif, verif_path = here("data", "verification", "det"))
We can now plot some scores
plot_point_verif(verif, bias)
plot_point_verif(verif, rmse)
plot_point_verif(verif, frequency_bias)
The last plot looks strange. That’s because the scores exist for each threshold and they are not being separated. We can separate them by mapping the colour to each threshold, by faceting by threshold (faceting means to separate into panels), or by filtering to a single threshold.
plot_point_verif(verif, frequency_bias, colour_by = threshold)
plot_point_verif(verif, frequency_bias, facet_by = vars(threshold))
plot_point_verif(verif, frequency_bias, filter_by = vars(threshold == 285))
facet_by
and filter_by
values must be wrapped in vars()
This is because faceting and filtering can be done by more than one variable and the vars()
function facilitates that.
For hexbin plots, the plots are automatically faceted by the grouping variable of the verification - in this case lead time.
plot_point_verif(verif, hexbin)
Basic ensemble verification
The workflow for verifying ensemble forecasts is much the same as that for deterministic verification. The only real difference is using the ens_verify()
function to compute the score. In this example we will use data from the MEPS model for the same time period as before.
<- read_point_forecast(
fcst dttm = seq_dttm(2023080100, 2023083100, "1d"),
fcst_model = "meps",
fcst_type = "eps",
parameter = "T2m",
file_path = fcst_dir
)
<- read_point_obs(
obs dttm = unique_valid_dttm(fcst),
parameter = "T2m",
stations = unique_stations(fcst),
obs_path = obs_dir
)
<- join_to_fcst(fcst, obs) fcst
<- ens_verify(fcst, T2m, thresholds = seq(280, 290, 2.5))
verif
verif
::ens_summary_scores:: # A tibble: 9 × 16
fcst_model lead_time num_cases num_stations mean_bias stde rmse spread
<chr> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 meps 0 5830 192 0.219 1.40 1.41 0.952
2 meps 3 5884 193 0.227 1.49 1.51 0.942
3 meps 6 5920 195 0.127 1.20 1.21 0.770
4 meps 9 5940 194 0.0789 1.27 1.27 0.942
5 meps 12 5933 195 0.00602 1.32 1.32 1.01
6 meps 15 5966 194 -0.0223 1.27 1.27 0.989
7 meps 18 5884 195 0.292 1.22 1.26 0.782
8 meps 21 5909 193 0.511 1.61 1.69 0.812
9 meps 24 5832 192 0.593 1.82 1.92 0.878
# ℹ 8 more variables: dropped_members_spread <dbl>, spread_skill_ratio <dbl>,
# dropped_members_spread_skill_ratio <dbl>, hexbin <list>,
# rank_histogram <list>, crps <dbl>, crps_potential <dbl>,
# crps_reliability <dbl>
::ens_threshold_scores:: # A tibble: 45 × 19
fcst_model lead_time threshold fair_brier_score sample_climatology
<chr> <int> <dbl> <dbl> <dbl>
1 meps 0 280 0.0457 0.892
2 meps 3 280 0.0578 0.878
3 meps 6 280 0.0132 0.975
4 meps 9 280 0.00352 0.991
5 meps 12 280 0.00304 0.994
6 meps 15 280 0.00362 0.993
7 meps 18 280 0.00339 0.990
8 meps 21 280 0.0451 0.941
9 meps 24 280 0.0719 0.891
10 meps 0 282. 0.0753 0.732
# ℹ 35 more rows
# ℹ 14 more variables: bss_ref_climatology <dbl>, num_stations <int>,
# num_cases_total <int>, num_cases_observed <int>, num_cases_forecast <int>,
# brier_score <dbl>, brier_skill_score <dbl>, brier_score_reliability <dbl>,
# brier_score_resolution <dbl>, brier_score_uncertainty <dbl>,
# reliability <list>, roc <list>, roc_area <dbl>, economic_value <list>
::det_summary_scores:: # A tibble: 54 × 11
fcst_model sub_model member lead_time num_cases num_stations bias rmse
<chr> <chr> <chr> <int> <int> <int> <dbl> <dbl>
1 meps meps mbr000 0 5830 192 0.244 1.29
2 meps meps mbr000 3 5884 193 0.231 1.44
3 meps meps mbr000 6 5920 195 0.147 1.21
4 meps meps mbr000 9 5940 194 0.108 1.32
5 meps meps mbr000 12 5933 195 0.0298 1.39
6 meps meps mbr000 15 5966 194 -0.0109 1.35
7 meps meps mbr000 18 5884 195 0.331 1.33
8 meps meps mbr000 21 5909 193 0.557 1.76
9 meps meps mbr000 24 5832 192 0.634 1.99
10 meps meps mbr001 0 5830 192 0.0680 1.60
# ℹ 44 more rows
# ℹ 3 more variables: mae <dbl>, stde <dbl>, hexbin <list>
--harp verification for T2m--
# for forecasts from 00:00 UTC 01 aug. 2023 to 00:00 UTC 31 aug. 2023
# using 195 observation stations
# for verification groups:
-> lead_time
save_point_verif(verif, here("data", "verification", "ens"))
Since this is ensemble data, ensemble specific scores have been computed as well as deterministic summary scores for each member. There are a couple of scores that plot_point_verif()
can derive that are not columns in the data frames - these are spread_skill
and brier_score_decomposition
.
plot_point_verif(verif, spread_skill)
plot_point_verif(verif, crps)
plot_point_verif(verif, brier_score, facet_by = vars(threshold))
plot_point_verif(verif, brier_score_decomposition, facet_by = vars(threshold))
plot_point_verif(verif, reliability, facet_by = vars(threshold))
Again the last one has issues with overplotting. This is because there should be one plot for each threshold and each grouping variable (in this case lead_time
), so we need to filter.
plot_point_verif(
verif,
reliability, facet_by = vars(threshold),
filter_by = vars(lead_time == 12)
)
Comparing forecast models
Verification scores are often useful for comparing the performance of different models, or model developments. With harp it is straightforward to compare different models - it is simply case of reading multiple forecasts in in one go. Here we are going to do a deterministic comparison of the AROME-Arctic model and member 0 of the MEPS and IFSENS ensembles. We need to read the ensemble and deterministic forecasts in separately due to the different fcst_type
argument. When there are multiple forecast models, we get a harp_list
, which works in the same way as a standard list in R. This means that when we read for the second time we can put the output in a named element of the harp_list
<- read_point_forecast(
fcst dttm = seq_dttm(2023080100, 2023083100, "1d"),
fcst_model = c("meps", "ifsens"),
fcst_type = "eps",
parameter = "T2m",
members = 0,
file_path = fcst_dir
|>
) as_det()
$arome_arctic <- read_point_forecast(
fcstdttm = seq_dttm(2023080100, 2023083100, "1d"),
fcst_model = "arome_arctic",
fcst_type = "det",
parameter = "T2m",
file_path = fcst_dir
)
fcst
• meps
::deterministic point forecast:: # A tibble: 67,239 × 10
fcst_model fcst_dttm valid_dttm lead_time SID parameter
* <chr> <dttm> <dttm> <int> <int> <chr>
1 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1001 T2m
2 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1010 T2m
3 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1014 T2m
4 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1015 T2m
5 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1018 T2m
6 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1023 T2m
7 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1025 T2m
8 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1026 T2m
9 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1027 T2m
10 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1028 T2m
# ℹ 67,229 more rows
# ℹ 4 more variables: fcst_cycle <chr>, model_elevation <dbl>, units <chr>,
# fcst <dbl>
• ifsens
::deterministic point forecast:: # A tibble: 37,355 × 10
fcst_model fcst_dttm valid_dttm lead_time SID parameter
* <chr> <dttm> <dttm> <int> <int> <chr>
1 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1001 T2m
2 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1010 T2m
3 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1014 T2m
4 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1015 T2m
5 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1018 T2m
6 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1023 T2m
7 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1025 T2m
8 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1026 T2m
9 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1027 T2m
10 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1028 T2m
# ℹ 37,345 more rows
# ℹ 4 more variables: fcst_cycle <chr>, model_elevation <dbl>, units <chr>,
# fcst <dbl>
• arome_arctic
::deterministic point forecast:: # A tibble: 64,800 × 11
fcst_model fcst_dttm valid_dttm lead_time SID parameter
<chr> <dttm> <dttm> <int> <int> <chr>
1 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1001 T2m
2 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1010 T2m
3 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1014 T2m
4 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1015 T2m
5 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1018 T2m
6 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1023 T2m
7 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1025 T2m
8 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1026 T2m
9 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1027 T2m
10 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1028 T2m
# ℹ 64,790 more rows
# ℹ 5 more variables: z <dbl>, fcst <dbl>, fcst_cycle <chr>,
# model_elevation <dbl>, units <chr>
When we have multiple forecast models we should ensure that we are comparing like with like. We therefore only want to verify the cases that are common to all of the forecast models. We can select these cases using common_cases()
. Note the number of rows in each of the data frames before selecting the common cases above and after selecting the common cases below.
<- common_cases(fcst)
fcst
fcst
• meps
::deterministic point forecast:: # A tibble: 36,000 × 11
fcst_model fcst_dttm valid_dttm lead_time SID parameter
<chr> <dttm> <dttm> <int> <int> <chr>
1 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1001 T2m
2 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1010 T2m
3 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1014 T2m
4 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1015 T2m
5 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1018 T2m
6 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1023 T2m
7 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1025 T2m
8 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1026 T2m
9 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1027 T2m
10 meps 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1028 T2m
# ℹ 35,990 more rows
# ℹ 5 more variables: fcst_cycle <chr>, model_elevation <dbl>, units <chr>,
# fcst <dbl>, z <dbl>
• ifsens
::deterministic point forecast:: # A tibble: 36,000 × 11
fcst_model fcst_dttm valid_dttm lead_time SID parameter
<chr> <dttm> <dttm> <int> <int> <chr>
1 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1001 T2m
2 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1010 T2m
3 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1014 T2m
4 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1015 T2m
5 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1018 T2m
6 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1023 T2m
7 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1025 T2m
8 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1026 T2m
9 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1027 T2m
10 ifsens 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1028 T2m
# ℹ 35,990 more rows
# ℹ 5 more variables: fcst_cycle <chr>, model_elevation <dbl>, units <chr>,
# fcst <dbl>, z <dbl>
• arome_arctic
::deterministic point forecast:: # A tibble: 36,000 × 11
fcst_model fcst_dttm valid_dttm lead_time SID parameter
<chr> <dttm> <dttm> <int> <int> <chr>
1 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1001 T2m
2 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1010 T2m
3 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1014 T2m
4 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1015 T2m
5 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1018 T2m
6 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1023 T2m
7 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1025 T2m
8 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1026 T2m
9 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1027 T2m
10 arome_arct… 2023-08-01 00:00:00 2023-08-01 00:00:00 0 1028 T2m
# ℹ 35,990 more rows
# ℹ 5 more variables: z <dbl>, fcst <dbl>, fcst_cycle <chr>,
# model_elevation <dbl>, units <chr>
The rest of the verification workflow is exactly the same.
<- read_point_obs(
obs dttm = unique_valid_dttm(fcst),
parameter = "T2m",
stations = unique_stations(fcst),
obs_path = obs_dir
)
<- join_to_fcst(fcst, obs) fcst
<- det_verify(fcst, T2m , thresholds = seq(280, 290, 2.5))
verif
save_point_verif(verif, here("data", "verification", "det"))
When plotting the scores, each of the forecast models is automatically assigned a different colour.
plot_point_verif(verif, stde)
We could however plot each forecast model in a separate panel and assign, for example, threshold to control the colour.
plot_point_verif(
verif,
frequency_bias, colour_by = threshold,
facet_by = vars(fcst_model)
)
Observation errors
Gross error check
When reading observations, read_point_obs()
will do a gross error check on the observations to make sure that they have realistic values. You can set the bounds with the min_obs_allowed
and max_obs_allowed
arguments. For some parameters, there are default values for the minimum and maximum allowed. These can be seen in the parameter definitions with get_param_def()
. For example, for 2m temperature we see that the minimum and maximum values are 223 and 333 Kelvin.
get_param_def("t2m")
$description
[1] "Air temperature at 2m above the ground"
$min
[1] 223
$max
[1] 333
$grib
$grib$name
[1] "2t" "t"
$grib$level_type
[1] "heightAboveGround" "surface"
$grib$level
[1] 2
$netcdf
$netcdf$name
[1] "air_temperature_2m"
$v
$v$harp_param
[1] "T2m"
$v$name
[1] "TT"
$v$param_units
[1] "K"
$v$type
[1] "SYNOP"
$wrf
$wrf$name
[1] "T2"
$fa
$fa$name
[1] "CLSTEMPERATURE "
$fa$units
[1] "K"
$obsoul
$obsoul$name
[1] 39
$obsoul$units
[1] "K"
$obsoul$harp_name
[1] "T2m"
Checking observations against forecasts
Another check that can be done is to compare the values of the observations with forecasts. This is done with the check_obs_against_fcst()
function. By default the data are grouped by each station ID and time of day, where the day is split into 4 parts of the day [00:00, 06:00), [06, 12:00), [12:00, 18:00) and [18:00, 00:00) and the standard deviation of the forecast for each of these is computed. The observations are then compared with the forecasts, and if the difference between the, is larger than a certain number of standard deviations then that row is removed from the data.
Using the default value of 6 standard deviations we see that no observations are removed.
<- check_obs_against_fcst(fcst, T2m) fcst
ℹ Using default `num_sd_allowed = 6` for `parameter = T2m`
! 0 cases with more than 6 std devs error.
If we make the check more strict to remove all cases where the observations are more than 1 standard deviation away from the forecast, we see that now 723 cases are removed. We can see the removed cases in the "removed_cases"
attribute of the result.
<- check_obs_against_fcst(fcst, T2m, num_sd_allowed = 1) fcst
! 723 cases with more than 1 std dev error.
ℹ Removed cases can be seen in the "removed_cases" attribute.
attr(fcst, "removed_cases")
# A tibble: 723 × 5
SID valid_dttm T2m dist_to_fcst tolerance
<int> <dttm> <dbl> <dbl> <dbl>
1 1001 2023-08-09 12:00:00 283. 2.94 1.21
2 1001 2023-08-10 00:00:00 282. 1.54 1.09
3 1001 2023-08-10 06:00:00 282. 1.20 1.20
4 1001 2023-08-13 00:00:00 279. 2.89 1.09
5 1001 2023-08-19 00:00:00 280. 1.17 1.09
6 1001 2023-08-21 00:00:00 282. 1.55 1.09
7 1001 2023-08-23 00:00:00 276. 2.33 1.09
8 1001 2023-08-24 00:00:00 276. 1.30 1.09
9 1001 2023-08-24 06:00:00 276. 1.66 1.20
10 1001 2023-08-30 18:00:00 280. 1.39 1.15
# ℹ 713 more rows
Ensemble rescaling
When verifying ensembles, we can take the observation error into account by attempting to rescale the distribution of the ensemble forecast to that of the observations. This is done by adding an assumed error distribution to the ensemble forecast by sampling from the error distribution and adding each random draw to an ensemble member. If the error distribution has a mean of 0, the impact will be negligible on most scores. However, for the rank histogram and ensemble spread, this will in effect take the observation errors into account. In harp we call this jittering the forecast and is done by using the jitter_fcst()
function.
Take for example the rank histograms for our MEPS and IFSENS ensembles for 2m temperature.
<- read_point_forecast(
fcst dttm = seq_dttm(2023080100, 2023083112, "12h"),
fcst_model = c("meps", "ifsens"),
fcst_type = "eps",
parameter = "T2m",
file_path = fcst_dir
|>
) scale_param(-273.15, "degC") |>
common_cases()
<- read_point_obs(
obs dttm = unique_valid_dttm(fcst),
parameter = "T2m",
stations = unique_stations(fcst),
obs_path = obs_dir
|>
) scale_param(-273.15, "degC", col = T2m)
<- join_to_fcst(fcst, obs)
fcst
<- ens_verify(fcst, T2m)
verif
plot_point_verif(
verif,
normalized_rank_histogram, rank_is_relative = TRUE
)
plot_point_verif(
verif,
spread_skill )
We see that both models are underdispersed with a U-shaped rank histogram. If we say that the observations have a noraml error distribution with a mean of 0 and standard deviation of 1 we can jitter the ensemble forecast by adding random draws from that distribution.
<- jitter_fcst(
fcst
fcst, function(x) x + rnorm(length(x), mean = 0, sd = 1)
)
<- ens_verify(fcst, T2m)
verif
plot_point_verif(
verif,
normalized_rank_histogram, rank_is_relative = TRUE
)
plot_point_verif(
verif,
spread_skill )
Now we see that the forecast is much less underdispersed with many of the ranks have a normalized frequency close to 1 and the ensemble spread much closer to the ensemble RMSE.
harp does not include any estimates for error distributions. It is the user’s responsibility to provide those error distributions. In this example, a normal distribution with a mean of 0 and standard deviation of 1 is for illustrative purposes only.
Grouped Verification and Scaling
Basic Grouping
So far we have just done verification stratified by lead_time
. We can use the groupings
argument to tell the verification function how to group the data together to compute scores. The most common grouping after lead time would be to group by the forecast cycle. Rather than read in only the forecasts initialized at 00:00 UTC, we will read in the 12:00 UTC forecasts as well.
<- read_point_forecast(
fcst dttm = seq_dttm(2023080100, 2023083112, "12h"),
fcst_model = c("meps", "ifsens"),
fcst_type = "eps",
parameter = "T2m",
members = 0,
file_path = fcst_dir
|>
) as_det()
$arome_arctic <- read_point_forecast(
fcstdttm = seq_dttm(2023080100, 2023083112, "12h"),
fcst_model = "arome_arctic",
fcst_type = "det",
parameter = "T2m",
file_path = fcst_dir
)
<- common_cases(fcst) fcst
The forecasts are in Kelvin, but it may be more useful to have them in °C. We can scale the data using scale_param()
. At a minimum we need to give the function the scaling to apply and a new name for the units. By default the scaling is additive, but it can be made multiplicative by setting mult = TRUE
.
<- scale_param(fcst, -273.15, new_units = "degC") fcst
Now we can read the observations and join to the forecast.
<- read_point_obs(
obs dttm = unique_valid_dttm(fcst),
parameter = "T2m",
stations = unique_stations(fcst),
obs_path = obs_dir
)
<- join_to_fcst(fcst, obs) fcst
Warning: .fcst has units: degC and .join has units: K
Error: Join will not be done due to units incompatibility. You can force the join by setting force = TRUE
OR, units imcompatibility can be fixed with the set_units(), or scale_param() functions.
Here the join fails because the forecasts and observations do not have the same units. We therefore also need to scale the observations with scale_param()
. When scaling observations, the name of the observations column also needs to be provided.
<- scale_param(obs, -273.15, new_units = "degC", col = T2m)
obs
<- join_to_fcst(fcst, obs) fcst
Now we can verify. This time we will tell det_verify()
that we want scores for each lead time and forecast cycle.
<- det_verify(
verif
fcst,
T2m, thresholds = seq(10, 20, 2.5),
groupings = c("lead_time", "fcst_cycle")
)
plot_point_verif(verif, rmse, facet_by = vars(fcst_cycle))
We now have plots for each forecast cycle, but what if we also want the combined forecast cycles as well? There are two ways we could tackle that - firstly we could compute the verification with groupings = "lead_time"
(i.e. the default) and bind the output to what we already have with bind_point_verif()
.
<- bind_point_verif(
verif
verif, det_verify(fcst, T2m, thresholds = seq(10, 20, 2.5))
)
plot_point_verif(verif, rmse, facet_by = vars(fcst_cycle))
Grouping Lists
An easier way would be to pass a list to groupings
. Each element in the list is treated as a separate verification and then they are bound together at the end.
<- det_verify(
verif
fcst,
T2m, thresholds = seq(10, 20, 2.5),
groupings = list(
"lead_time",
c("lead_time", "fcst_cycle")
)
)
save_point_verif(verif, here("data", "verification", "det", "fcst-cycle"))
plot_point_verif(verif, rmse, facet_by = vars(fcst_cycle))
Facets are plotted in alphabetical order if the faceting variable is a string or coerced into a string. This means that the facets can be in an unexpected order. The fct_*()
functions from the forcats package can be used to help reorder the facets. Quite often the order in which the values of the variable appear in the data is the order you want the facets to be in so fct_inorder()
can be used to reorder the facets.
plot_point_verif(verif, rmse, facet_by = vars(fct_inorder(fcst_cycle)))
Complex Grouping Lists
Adding a station characteristic
It is often desirable to group stations together by common characteristics, whether that be location, whether it’s on the coast or in the mountains or any other characteristic. harp includes a built in data frame of station groups that can be joined to the forecast and used to group the verification. We can join the station groups using join_to_fcst()
with force = TRUE
since we don’t want a check on common units between the data frames to be joined together.
<- join_to_fcst(fcst, station_groups, force = TRUE) fcst
This has added a "station_group"
column to the forecast which we can use in the groupings
argument. Now we want verification for each lead time for all forecast cycles and station groups; for each lead time and each forecast cycle for all station groups; for each lead time and each station group for all forecast cycles; and for each lead time for each station group for each forecast cycle. Therefore we need a list of 4 different groupings.
<- det_verify(
verif
fcst,
T2m, thresholds = seq(10, 20, 2.5),
groupings = list(
"lead_time",
c("lead_time", "fcst_cycle"),
c("lead_time", "station_group"),
c("lead_time", "fcst_cycle", "station_group")
)
)
save_point_verif(verif, here("data", "verification", "det", "stations"))
Plotting now becomes quite complicated as you have to do a lot of filtering and faceting. For example:
plot_point_verif(
verif,
equitable_threat_score, facet_by = vars(station_group),
filter_by = vars(grepl(";", fcst_cycle), threshold == 15)
)
You may have noticed the saving of each verification result into specific directories. This is so that they can be plotted using an interactive app that runs in a web browser. The app is made using R Shiny so we refer to it as a Shiny app. save_point_verif()
uses very specific file names that describe some of the information about the verification object, but the grouping strategy is not one of those pieces of information, hence the separate directories. The Shiny app will create dropdown menus allowing you to choose the group for which you want to see the scores.
You can start the shiny app with:
shiny_plot_point_verif(
start_dir = here("data", "verification"),
full_dir_navigation = FALSE,
theme = "light"
)
When you start the shiny app it is best to give it a directory to start from to aid navigation. By setting full_dir_navigation = FALSE
the use of modal windows to navigate your directory system is disabled, and all directories below the start directory are searched for harp point verification files and the Select Verfication Directory dropdown is populated - this is experimental and may not give the smoothest ride, but is often less cumbersome than navigation by modal windows. Finally you can choose between “light”, “dark” and “white” for the colour theme of the app (the default is “dark”).
Changing the Time Axis
You can also use the groupings
argument to specify different time axes to use for the verification. So far we have used the lead time for the time axis. However we could also use the time of day to get the diurnal cycle, or simply get the scores for each date-time in the data set. In this example we will still group by the forecast cycle as well. To get the time of day, we first need to run expand_date()
on the data to get a column for "valid_hour"
.
<- expand_date(fcst, valid_dttm)
fcst
<- det_verify(
verif
fcst,
T2m, thresholds = seq(10, 20, 2.5),
groupings = list(
"lead_time",
c("lead_time", "fcst_cycle"),
"valid_hour",
c("valid_hour", "fcst_cycle"),
"valid_dttm",
c("valid_dttm", "fcst_cycle")
) )
Since the data are 6 hourly there are only 4 valid hours in the data set. When there are fewer than five different values in group the verification functions separate them with a “;” in the group value when they are all together, otherwise they are labelled as “All”. plot_point_verif()
only searches for “All” when figuring out which data to get for each different x-axis.
We need to use mutate_list()
in conjunction with case_when()
from the dplyr package to modify the "valid_hour"
column where all valid hours are collected together. mutate_list()
use mutate()
from the dplyr package to modify columns of data frames in a list whilst retaining the attributes of the list.
<- mutate_list(
verif
verif, valid_hour = case_when(
grepl(";", valid_hour) ~ "All",
.default = valid_hour
)
)
save_point_verif(verif, here("data", "verification", "det", "fcst-cycle"))
We can now use the x-axis
argument to plot_point_verif()
to decide which times to use on the x-axis.
plot_point_verif(verif, mae, facet_by = vars(fct_inorder(fcst_cycle)))
plot_point_verif(
verif,
mae, x_axis = valid_hour,
facet_by = vars(fct_inorder(fcst_cycle))
)
plot_point_verif(
verif,
mae, x_axis = valid_dttm,
facet_by = vars(fct_inorder(fcst_cycle)),
point_size = 0
)
Vertical Profiles
Another application where grouping is important in the verification of vertical profiles. In general the workflow is once again the same, but there are some aspects where you need to take into account that the data are on vertical levels.
The first difference is that when reading in the data, you need to tell the read function what vertical coordinate the data are on via the vertical_coordinate
argument. In most cases the data will be on pressure levels, but they could also be on height levels or model levels.
The data we are using here come from the AROME-Arctic model and the control member of MEPS, which at MET Norway is archived as MEPS_det (i.e. MEPS deterministic). There are forecasts available every 6 hours at 00-, 06-, 12- and 18-UTC.
<- read_point_forecast(
fcst dttm = seq_dttm(2023080100, 2023083118, "6h"),
fcst_model = c("meps", "arome_arctic"),
fcst_type = "det",
parameter = "T",
file_path = fcst_dir,
vertical_coordinate = "pressure"
|>
) scale_param(-273.15, "degC")
When finding the common cases, the default behaviour is to compare the "fcst_dttm"
, "lead_time"
and "SID"
columns. When finding common cases for vertical profiles we also need to make sure that only the vertical levels that are common to all forecast models are included in the verification. We do this by adding the pressure column (p
) to common_cases()
.
<- common_cases(fcst, p) fcst
Similar to reading the forecasts, we also need to tell read_point_obs
that the vertical coordinate is pressure.
<- read_point_obs(
obs dttm = unique_valid_dttm(fcst),
parameter = "T",
stations = unique_stations(fcst),
obs_path = obs_dir,
vertical_coordinate = "pressure"
|>
) scale_param(-273.15, "degC", col = T)
Joining works exactly the same as for single level variables.
<- join_to_fcst(fcst, obs) fcst
Now we can verify, making sure that we have "p"
as one of the grouping variables.
<- det_verify(
verif
fcst,
T, groupings = list(
c("lead_time", "p"),
c("lead_time", "p", "fcst_cycle")
)
)
save_point_verif(
verif, here("data", "verification", "det", "fcst-cycle")
)
We can now plot the profile verification using plot_profile_verif()
making sure to filter and facet appropriately (for example, there is one profile for each lead time).
plot_profile_verif(
verif,
mae, filter_by = vars(grepl(";", fcst_cycle)),
facet_by = vars(lead_time)
)
We could also make a plot for a single vertical level in the normal way. We may also want to remove times when there are very few cases.
plot_point_verif(
verif,
bias, filter_by = vars(p == 925, num_cases > 5),
facet_by = vars(fct_inorder(fcst_cycle))
)
Conditional Verification
Classification by observed value
On occasion, it may be instructive to verify for particular conditions. For example, to verify temperature for different temperature ranges. This can be done by creating a grouping column for each range of observed temperature. Here the workflow would be to use mutate()
from the dplyr package in association with the base R function cut()
to create a grouping column on the observations before joining to the forecast.
<- read_point_forecast(
fcst dttm = seq_dttm(2023080100, 2023083100, "24h"),
fcst_model = c("meps", "ifsens"),
fcst_type = "eps",
parameter = "T2m",
file_path = fcst_dir
|>
) scale_param(-273.15, "degC") |>
common_cases()
<- read_point_obs(
obs dttm = unique_valid_dttm(fcst),
parameter = "T2m",
stations = unique_stations(fcst),
obs_path = obs_dir
|>
) scale_param(-273.15, "degC", col = T2m)
We are going to classify the temperature by having the left side of each range open and right side closed. This basically means that each range goes from >= the lower value to < the upper value. We do this by setting right = FALSE
in the call to cut()
.
<- mutate(
obs
obs, temp_range = cut(T2m, seq(5, 25, 2.5), right = FALSE)
)
obs
# A tibble: 23,756 × 8
valid_dttm SID lon lat elev units T2m temp_range
<dttm> <int> <dbl> <dbl> <dbl> <chr> <dbl> <fct>
1 2023-08-01 00:00:00 1001 -8.67 70.9 10 degC 6.75 [5,7.5)
2 2023-08-01 00:00:00 1010 16.1 69.3 13 degC 14.4 [12.5,15)
3 2023-08-01 00:00:00 1015 17.8 69.6 33 degC 18.6 [17.5,20)
4 2023-08-01 00:00:00 1023 18.5 69.1 77 degC 13.1 [12.5,15)
5 2023-08-01 00:00:00 1025 18.9 69.7 9 degC 13.0 [12.5,15)
6 2023-08-01 00:00:00 1026 18.9 69.7 115 degC 14.2 [12.5,15)
7 2023-08-01 00:00:00 1027 18.9 69.7 20 degC 13.0 [12.5,15)
8 2023-08-01 00:00:00 1028 19.0 74.5 16 degC 4.55 <NA>
9 2023-08-01 00:00:00 1033 19.5 70.2 24 degC 14.5 [12.5,15)
10 2023-08-01 00:00:00 1035 20.1 69.6 710 degC 12.9 [12.5,15)
# ℹ 23,746 more rows
You will see that cut()
labels values outside of the breaks as NA
. This isn’t necessarily meaningful, so we will use the dplyr function case_when()
to give more meaningful labels. We will also take the opportunity to format the other ranges a bit better.
<- mutate(
obs
obs, temp_range = case_when(
< 5 ~ "< 5",
T2m >= 25 ~ ">= 25",
T2m .default = gsub(",", ", ", temp_range)
), temp_range = fct_relevel(
temp_range, c(
"< 5",
gsub(
",",
", ",
levels(cut(seq(5, 25), seq(5, 25, 2.5), right = FALSE))
), ">= 25"
)
) )
In the above we also set the factor levels to an order that makes sense for when we come to plot the scores. We can now continue as normal, adding “temp_range"
as a grouping variable.
<- join_to_fcst(fcst, obs)
fcst
<- ens_verify(
verif
fcst,
T2m, groupings = c("leadtime", "temp_range")
)
plot_point_verif(
verif,
spread_skill, facet_by = vars(temp_range)
)
Classification by a different parameter
One important aspect of forecast model performance is how it performs in different weather regimes. For example, it may be useful to know how the model performs for a particular parameter when the wind is coming from a certain direction. There are a couple of approaches to this, but it must be noted that it is important to include the possibility for false alarms in the verification.
Here we will compute the verification scores for an ensemble forecast for 2m temperature when the wind is coming from the east, so let’s say a wind direction between 45° and 135°. We will first read in the wind direction observations for the cases we already have for 2m temperature.
<- read_point_obs(
wind_dir dttm = unique_valid_dttm(fcst),
parameter = "D10m",
stations = unique_stations(fcst),
obs_path = obs_dir
)
We will now create a column to have the groups “westerly” and “other” based on the values in the "D10m"
column.
<- mutate(
wind_dir
wind_dir, wind_direction = case_when(
between(D10m, 45, 135) ~ "easterly",
.default = "other"
) )
We can now join the wind direction data to the 2m temperature data that we want to verify. Here we have to set force = TRUE
in join_to_fcst()
since it will fail the check for the forecasts and observations having the same units.
<- join_to_fcst(fcst, wind_dir, force = TRUE) fcst
We can now verify 2m temperature and group by the wind direction.
<- ens_verify(
verif
fcst,
T2m, groupings = c("lead_time", "wind_direction")
)
plot_point_verif(
verif,
mean_bias, facet_by = vars(wind_direction)
)
A more thorough method might be to include whether easterly winds were forecast or not in the analysis, such that we have cases of westerly being observed only, forecasted only, both observed and forecasted, and neither observed nor forecasted.
So now we need read in the wind direction forecasts as well. In this case we need to set the units to “degrees” as they are incorrect in the file (due to a bug :/ ).
<- read_point_forecast(
fcst_wind_dir dttm = unique_fcst_dttm(fcst),
fcst_model = c("meps", "ifsens"),
fcst_type = "eps",
parameter = "wd10m",
lead_time = unique_col(fcst, lead_time),
stations = unique_stations(fcst),
file_path = fcst_dir
|>
) set_units("degrees")
Since the forecasts are from ensemble, we could identify forecasts that have westerly winds as those where at least one member has a wind direction between 45° and and 135° To identify these cases we can find the binary probability for each member and then compute the ensemble mean with ens_stats()
. We can apply a function to multiple columns using across()
from the dplyr package together with mutate()
. Since we are generating logical values, we need to tell ens_stats()
not to compute the standard deviation.
<- mutate(
fcst_wind_dir
fcst_wind_dir,across(contains("_mbr"), ~between(.x, 45, 135))
|>
) ens_stats(sd = FALSE)
Now we can join the wind direction observations, generate our groups and select only those columns we need.
<- join_to_fcst(fcst_wind_dir, wind_dir)
fcst_wind_dir
<- mutate(
fcst_wind_dir
fcst_wind_dir, easterly = case_when(
> 0 & wind_direction == "easterly" ~ "both",
ens_mean > 0 & wind_direction == "other" ~ "forecast only",
ens_mean == 0 & wind_direction == "easterly" ~ "observed only",
ens_mean == 0 & wind_direction == "other" ~ "neither",
ens_mean .default = NA
)|>
) select(fcst_dttm, lead_time, SID, easterly)
We can now join our data frame with the verification groups to the 2m temperature forecasts to be verified. Since we have two models, we can merge them into a single data frame using bind()
.
<- join_to_fcst(
fcst
fcst, bind(fcst_wind_dir),
force = TRUE
)
Now we can verify and plot for our new groups. fct_relevel()
is used to get the plot facets in the desired order.
<- ens_verify(
verif
fcst,
T2m,groupings = c("lead_time", "easterly")
)
plot_point_verif(
verif,
mean_bias, facet_by = vars(fct_relevel(easterly, ~.x[c(2, 4, 1, 3)])),
num_facet_cols = 2
)
In the next tutorial will put everything you have learned here into practice by building a script that could be used for repeated verification tasks.