join_to_fcst
is a special case of the join family of functions. It's
primary purpose is to join data frame of observations to a data frame or
harp_list of forecasts such that forecast - observation pairs are on the
same row in the joined data frame. An extra check is made to make sure that
the forecast data and observations data are in the same units.
Usage
join_to_fcst(
.fcst,
.join,
join_type = c("inner", "left", "right", "full", "semi", "anti"),
by = NULL,
latlon = FALSE,
elev = FALSE,
force = FALSE,
keep_x = TRUE,
keep_y = FALSE,
...
)
Arguments
- .fcst
A
harp_df
data frame or aharp_list
.- .join
A data frame to join to the forecast.
- join_type
How to join the data frame. Acceptable values are: "inner", "left", "right", "full", "semi", "anti". See
join
for more details.- by
Which columns to join by - if set to NULL a natural join will be done, using all variables with common names across .fcst and .join. The default is to join using all common columns in .fcst and .join excluding lat, lon and elev. This is because they may be stored to different levels of precision and the join will thus fail.
- latlon
Logical. Whether to include latitude and longitude columns in the default for
by
. The default is FALSE.- elev
Logical. Whether to include the station elevation column in the default for
by
. The default is FALSE.- force
Set to TRUE to force the join to happen even if the units in .fcst and .join are not compatible.
- keep_x, keep_y
Where duplicate column names are found, but not used in the join, these arguments are used to indicate whether the duplicate columns from .fcst (
keep_x
), or .join (keep_y
) should be kept. The default iskeep_x = TRUE, keep_y = FALSE
.- ...
Other arguments for join.
Examples
# Make some fake observations
library(tibble)
obs <- tibble(
valid_dttm = det_point_df$valid_dttm,
SID = det_point_df$SID,
units = "degC",
T2m = runif(nrow(det_point_df))
)
# Make sure the forecast has units
fcst <- set_units(det_point_df, "degC")
join_to_fcst(fcst, obs)
#> Joining, by = c("valid_dttm", "SID", "units")
#> ::deterministic point forecast:: # A tibble: 48 × 8
#> fcst_model fcst_dttm lead_time valid_dttm SID fcst
#> <chr> <dttm> <dbl> <dttm> <dbl> <dbl>
#> 1 point 2021-01-01 00:00:00 0 2021-01-01 00:00:00 1001 0.300
#> 2 point 2021-01-01 00:00:00 1 2021-01-01 01:00:00 1001 0.611
#> 3 point 2021-01-01 00:00:00 2 2021-01-01 02:00:00 1001 0.802
#> 4 point 2021-01-01 00:00:00 3 2021-01-01 03:00:00 1001 0.361
#> 5 point 2021-01-01 00:00:00 4 2021-01-01 04:00:00 1001 0.213
#> 6 point 2021-01-01 00:00:00 5 2021-01-01 05:00:00 1001 0.736
#> 7 point 2021-01-01 00:00:00 6 2021-01-01 06:00:00 1001 0.177
#> 8 point 2021-01-01 00:00:00 7 2021-01-01 07:00:00 1001 0.866
#> 9 point 2021-01-01 00:00:00 8 2021-01-01 08:00:00 1001 0.109
#> 10 point 2021-01-01 00:00:00 9 2021-01-01 09:00:00 1001 0.436
#> # ℹ 38 more rows
#> # ℹ 2 more variables: units <chr>, T2m <dbl>
# Also works for harp_list objects
join_to_fcst(set_units(det_point_list, "degC"), obs)
#> Joining, by = c("valid_dttm", "SID", "units")
#> Joining, by = c("valid_dttm", "SID", "units")
#> • a
#> ::deterministic point forecast:: # A tibble: 48 × 8
#> fcst_model fcst_dttm lead_time valid_dttm SID fcst
#> <chr> <dttm> <dbl> <dttm> <dbl> <dbl>
#> 1 a 2021-01-01 00:00:00 0 2021-01-01 00:00:00 1001 0.254
#> 2 a 2021-01-01 00:00:00 1 2021-01-01 01:00:00 1001 0.0506
#> 3 a 2021-01-01 00:00:00 2 2021-01-01 02:00:00 1001 0.236
#> 4 a 2021-01-01 00:00:00 3 2021-01-01 03:00:00 1001 0.298
#> 5 a 2021-01-01 00:00:00 4 2021-01-01 04:00:00 1001 0.467
#> 6 a 2021-01-01 00:00:00 5 2021-01-01 05:00:00 1001 0.376
#> 7 a 2021-01-01 00:00:00 6 2021-01-01 06:00:00 1001 0.217
#> 8 a 2021-01-01 00:00:00 7 2021-01-01 07:00:00 1001 0.696
#> 9 a 2021-01-01 00:00:00 8 2021-01-01 08:00:00 1001 0.227
#> 10 a 2021-01-01 00:00:00 9 2021-01-01 09:00:00 1001 0.359
#> # ℹ 38 more rows
#> # ℹ 2 more variables: units <chr>, T2m <dbl>
#>
#> • b
#> ::deterministic point forecast:: # A tibble: 48 × 8
#> fcst_model fcst_dttm lead_time valid_dttm SID fcst
#> <chr> <dttm> <dbl> <dttm> <dbl> <dbl>
#> 1 b 2021-01-01 00:00:00 0 2021-01-01 00:00:00 1001 0.746
#> 2 b 2021-01-01 00:00:00 1 2021-01-01 01:00:00 1001 0.409
#> 3 b 2021-01-01 00:00:00 2 2021-01-01 02:00:00 1001 0.484
#> 4 b 2021-01-01 00:00:00 3 2021-01-01 03:00:00 1001 0.677
#> 5 b 2021-01-01 00:00:00 4 2021-01-01 04:00:00 1001 0.730
#> 6 b 2021-01-01 00:00:00 5 2021-01-01 05:00:00 1001 0.413
#> 7 b 2021-01-01 00:00:00 6 2021-01-01 06:00:00 1001 0.689
#> 8 b 2021-01-01 00:00:00 7 2021-01-01 07:00:00 1001 0.430
#> 9 b 2021-01-01 00:00:00 8 2021-01-01 08:00:00 1001 0.720
#> 10 b 2021-01-01 00:00:00 9 2021-01-01 09:00:00 1001 0.194
#> # ℹ 38 more rows
#> # ℹ 2 more variables: units <chr>, T2m <dbl>
#>
# And works with gridded data
join_to_fcst(set_units(ens_grid_df, "degC"), set_units(anl_grid_df, "degC"))
#> Joining, by = c("valid_dttm", "units")
#> ::ensemble gridded forecast:: # A tibble: 24 × 8
#> fcst_dttm lead_time valid_dttm units grid_mbr000
#> <dttm> <dbl> <dttm> <chr> <geolist>
#> 1 2021-01-01 00:00:00 0 2021-01-01 00:00:00 degC [5 × 5]
#> 2 2021-01-01 00:00:00 1 2021-01-01 01:00:00 degC [5 × 5]
#> 3 2021-01-01 00:00:00 2 2021-01-01 02:00:00 degC [5 × 5]
#> 4 2021-01-01 00:00:00 3 2021-01-01 03:00:00 degC [5 × 5]
#> 5 2021-01-01 00:00:00 4 2021-01-01 04:00:00 degC [5 × 5]
#> 6 2021-01-01 00:00:00 5 2021-01-01 05:00:00 degC [5 × 5]
#> 7 2021-01-01 00:00:00 6 2021-01-01 06:00:00 degC [5 × 5]
#> 8 2021-01-01 00:00:00 7 2021-01-01 07:00:00 degC [5 × 5]
#> 9 2021-01-01 00:00:00 8 2021-01-01 08:00:00 degC [5 × 5]
#> 10 2021-01-01 00:00:00 9 2021-01-01 09:00:00 degC [5 × 5]
#> # ℹ 14 more rows
#> # ℹ 3 more variables: grid_mbr001 <geolist>, anl_model <chr>, anl <geolist>