Join data to a forecast — join_to

join_to_fcst is a special case of the join family of functions. It's primary purpose is to join data frame of observations to a data frame or harp_list of forecasts such that forecast - observation pairs are on the same row in the joined data frame. An extra check is made to make sure that the forecast data and observations data are in the same units.

Usage

join_to_fcst(
  .fcst,
  .join,
  join_type = c("inner", "left", "right", "full", "semi", "anti"),
  by = NULL,
  latlon = FALSE,
  elev = FALSE,
  force = FALSE,
  keep_x = TRUE,
  keep_y = FALSE,
  ...
)

Arguments

.fcst: A harp_df data frame or a harp_list.
.join: A data frame to join to the forecast.
join_type: How to join the data frame. Acceptable values are: "inner", "left", "right", "full", "semi", "anti". See join for more details.
by: Which columns to join by - if set to NULL a natural join will be done, using all variables with common names across .fcst and .join. The default is to join using all common columns in .fcst and .join excluding lat, lon and elev. This is because they may be stored to different levels of precision and the join will thus fail.
latlon: Logical. Whether to include latitude and longitude columns in the default for by. The default is FALSE.
elev: Logical. Whether to include the station elevation column in the default for by. The default is FALSE.
force: Set to TRUE to force the join to happen even if the units in .fcst and .join are not compatible.
keep_x, keep_y: Where duplicate column names are found, but not used in the join, these arguments are used to indicate whether the duplicate columns from .fcst (keep_x), or .join (keep_y) should be kept. The default is keep_x = TRUE, keep_y = FALSE.
...: Other arguments for join.

Value

The input forecast data frame with column(s) added from .join.

Examples

# Make some fake observations
library(tibble)
obs <- tibble(
  valid_dttm = det_point_df$valid_dttm,
  SID       = det_point_df$SID,
  units     = "degC",
  T2m       = runif(nrow(det_point_df))
)

# Make sure the forecast has units
fcst <- set_units(det_point_df, "degC")

join_to_fcst(fcst, obs)
#> Joining, by = c("valid_dttm", "SID", "units")
#> ::deterministic point forecast:: # A tibble: 48 × 8
#>    fcst_model fcst_dttm           lead_time valid_dttm            SID  fcst
#>    <chr>      <dttm>                  <dbl> <dttm>              <dbl> <dbl>
#>  1 point      2021-01-01 00:00:00         0 2021-01-01 00:00:00  1001 0.300
#>  2 point      2021-01-01 00:00:00         1 2021-01-01 01:00:00  1001 0.611
#>  3 point      2021-01-01 00:00:00         2 2021-01-01 02:00:00  1001 0.802
#>  4 point      2021-01-01 00:00:00         3 2021-01-01 03:00:00  1001 0.361
#>  5 point      2021-01-01 00:00:00         4 2021-01-01 04:00:00  1001 0.213
#>  6 point      2021-01-01 00:00:00         5 2021-01-01 05:00:00  1001 0.736
#>  7 point      2021-01-01 00:00:00         6 2021-01-01 06:00:00  1001 0.177
#>  8 point      2021-01-01 00:00:00         7 2021-01-01 07:00:00  1001 0.866
#>  9 point      2021-01-01 00:00:00         8 2021-01-01 08:00:00  1001 0.109
#> 10 point      2021-01-01 00:00:00         9 2021-01-01 09:00:00  1001 0.436
#> # ℹ 38 more rows
#> # ℹ 2 more variables: units <chr>, T2m <dbl>

# Also works for harp_list objects
join_to_fcst(set_units(det_point_list, "degC"), obs)
#> Joining, by = c("valid_dttm", "SID", "units")
#> Joining, by = c("valid_dttm", "SID", "units")
#> • a
#> ::deterministic point forecast:: # A tibble: 48 × 8
#>    fcst_model fcst_dttm           lead_time valid_dttm            SID   fcst
#>    <chr>      <dttm>                  <dbl> <dttm>              <dbl>  <dbl>
#>  1 a          2021-01-01 00:00:00         0 2021-01-01 00:00:00  1001 0.254 
#>  2 a          2021-01-01 00:00:00         1 2021-01-01 01:00:00  1001 0.0506
#>  3 a          2021-01-01 00:00:00         2 2021-01-01 02:00:00  1001 0.236 
#>  4 a          2021-01-01 00:00:00         3 2021-01-01 03:00:00  1001 0.298 
#>  5 a          2021-01-01 00:00:00         4 2021-01-01 04:00:00  1001 0.467 
#>  6 a          2021-01-01 00:00:00         5 2021-01-01 05:00:00  1001 0.376 
#>  7 a          2021-01-01 00:00:00         6 2021-01-01 06:00:00  1001 0.217 
#>  8 a          2021-01-01 00:00:00         7 2021-01-01 07:00:00  1001 0.696 
#>  9 a          2021-01-01 00:00:00         8 2021-01-01 08:00:00  1001 0.227 
#> 10 a          2021-01-01 00:00:00         9 2021-01-01 09:00:00  1001 0.359 
#> # ℹ 38 more rows
#> # ℹ 2 more variables: units <chr>, T2m <dbl>
#> 
#> • b
#> ::deterministic point forecast:: # A tibble: 48 × 8
#>    fcst_model fcst_dttm           lead_time valid_dttm            SID  fcst
#>    <chr>      <dttm>                  <dbl> <dttm>              <dbl> <dbl>
#>  1 b          2021-01-01 00:00:00         0 2021-01-01 00:00:00  1001 0.746
#>  2 b          2021-01-01 00:00:00         1 2021-01-01 01:00:00  1001 0.409
#>  3 b          2021-01-01 00:00:00         2 2021-01-01 02:00:00  1001 0.484
#>  4 b          2021-01-01 00:00:00         3 2021-01-01 03:00:00  1001 0.677
#>  5 b          2021-01-01 00:00:00         4 2021-01-01 04:00:00  1001 0.730
#>  6 b          2021-01-01 00:00:00         5 2021-01-01 05:00:00  1001 0.413
#>  7 b          2021-01-01 00:00:00         6 2021-01-01 06:00:00  1001 0.689
#>  8 b          2021-01-01 00:00:00         7 2021-01-01 07:00:00  1001 0.430
#>  9 b          2021-01-01 00:00:00         8 2021-01-01 08:00:00  1001 0.720
#> 10 b          2021-01-01 00:00:00         9 2021-01-01 09:00:00  1001 0.194
#> # ℹ 38 more rows
#> # ℹ 2 more variables: units <chr>, T2m <dbl>
#> 

# And works with gridded data
join_to_fcst(set_units(ens_grid_df, "degC"), set_units(anl_grid_df, "degC"))
#> Joining, by = c("valid_dttm", "units")
#> ::ensemble gridded forecast:: # A tibble: 24 × 8
#>    fcst_dttm           lead_time valid_dttm          units grid_mbr000
#>    <dttm>                  <dbl> <dttm>              <chr>   <geolist>
#>  1 2021-01-01 00:00:00         0 2021-01-01 00:00:00 degC      [5 × 5]
#>  2 2021-01-01 00:00:00         1 2021-01-01 01:00:00 degC      [5 × 5]
#>  3 2021-01-01 00:00:00         2 2021-01-01 02:00:00 degC      [5 × 5]
#>  4 2021-01-01 00:00:00         3 2021-01-01 03:00:00 degC      [5 × 5]
#>  5 2021-01-01 00:00:00         4 2021-01-01 04:00:00 degC      [5 × 5]
#>  6 2021-01-01 00:00:00         5 2021-01-01 05:00:00 degC      [5 × 5]
#>  7 2021-01-01 00:00:00         6 2021-01-01 06:00:00 degC      [5 × 5]
#>  8 2021-01-01 00:00:00         7 2021-01-01 07:00:00 degC      [5 × 5]
#>  9 2021-01-01 00:00:00         8 2021-01-01 08:00:00 degC      [5 × 5]
#> 10 2021-01-01 00:00:00         9 2021-01-01 09:00:00 degC      [5 × 5]
#> # ℹ 14 more rows
#> # ℹ 3 more variables: grid_mbr001 <geolist>, anl_model <chr>, anl <geolist>