Reading forecast model data

One of the most important capabilities of harpIO is reading data from NWP and climate models. The original solution to this task was to have separate functions for different, but related tasks. read_det_interpolate() and read_eps_interpolate() to read data from multiple files for deterministic and ensemble forecasts respectively and to interpolate to point locations, and read_grid() to read gridded data from a single file. This solutions means the user having to learn several functions, each with their own arguments and output structures. There is also a lack of flexibility in, for example, reading from both deterministic and ensemble forecast files at the same time.

We have attempted to streamline this process and to make it as generic as possible with the read_forecast() function, which we hope results in a more strightforward user interface. The function is designed to both read the forecast data from multiple files and, where required, transform the data by interpolation or regridding. This means that the output is always a class harp_fcst list of data frames whether the data are at points, a 2d grid or a vertical cross section.

Furthermore, many arguments as passed as lists generated by *_opts() functions. This means that you can use tab autocompletion to get the correct argument names, and can store various options combinations as variables.

Example data

For all of these examples we will use data provided in the harpData package, which, if you haven’t done so already, you can install with:

remotes::install_github("harphub/harpData")

Deterministic forecasts

The simplest application of read_forecast() is to read deterministic forecasts. At a minimum, the start date, end date, a name for the forecast model (fcst_model) and the parameter to be read in are needed as arguments. The data we will use are grib files from the AROME-Arctic model that come with harpData. By default the function looks in the current working directory for the files that have the “vfld” template (see show_file_templates()). Therefore, we need to also pass the file_path and file_template arguments. Finally, the default behaviour of read_forecast() is to not return any data - this is because one of its main roles is to write the data to output files in another format and the volumes of data involved could easily become too large to hold in memory - so we should also set return_data = TRUE.

library(harpIO)

#> Loading required package: harpCore

#> 
#> Attaching package: 'harpCore'

#> The following object is masked from 'package:stats':
#> 
#>     filter

read_forecast(
  2018071000,
  "AROME_Arctic",
  "T2m",
  file_path     = system.file("grib/AROME_Arctic", package = "harpData"),
  file_template = "harmonie_grib_fp",
  return_data   = TRUE
)

#> ::deterministic gridded forecast:: # A tibble: 17 × 11
#>    fcst_model   fcst_dttm           lead_time parameter valid_dttm         
#>    <chr>        <dttm>                  <dbl> <chr>     <dttm>             
#>  1 AROME_Arctic 2018-07-10 00:00:00         0 T2m       2018-07-10 00:00:00
#>  2 AROME_Arctic 2018-07-10 00:00:00         3 T2m       2018-07-10 03:00:00
#>  3 AROME_Arctic 2018-07-10 00:00:00         6 T2m       2018-07-10 06:00:00
#>  4 AROME_Arctic 2018-07-10 00:00:00         9 T2m       2018-07-10 09:00:00
#>  5 AROME_Arctic 2018-07-10 00:00:00        12 T2m       2018-07-10 12:00:00
#>  6 AROME_Arctic 2018-07-10 00:00:00        15 T2m       2018-07-10 15:00:00
#>  7 AROME_Arctic 2018-07-10 00:00:00        18 T2m       2018-07-10 18:00:00
#>  8 AROME_Arctic 2018-07-10 00:00:00        21 T2m       2018-07-10 21:00:00
#>  9 AROME_Arctic 2018-07-10 00:00:00        24 T2m       2018-07-11 00:00:00
#> 10 AROME_Arctic 2018-07-10 00:00:00        27 T2m       2018-07-11 03:00:00
#> # ℹ 7 more rows
#> # ℹ 6 more variables: step_range <chr>, level_type <chr>, level <int>,
#> #   units <chr>, fcst_cycle <chr>, fcst <geolist>

The last column in the data frame, named AROME_Arctic_det is the forecast data, which are stored as geofield objects - these are simply 2d arrays with attributes that describe the projection, time and parameter information of the data. For all deterministic models the name of the column is the forecast model name with a suffix “_det”.

Ensemble forecasts

Reading ensemble forecasts works in exactly the same way. harpData includes output from the AROME-Arctic ensemble model, AAEPS, for the same dates as before, but this time stored in a single netcdf file. Since the data are in a netcdf file, and netcdf files can use many different dimension names and variable names, read_forecast() needs some information about the structure of the netcdf file. This structure information is generated with the function netcdf_opts(), and since the data come from an EPS model produced by MET Norway we can use one of the built it options sets: netcdf_opts(options_set = "met_norway_eps"). All lead times and ensemble members are in the same file, so if you want to read all members from the file you don’t have to pass the members as an argument.

read_forecast(
  2018071000,
  "AAEPS",
  "T2m",
  file_path        = system.file("netcdf/AAEPS", package = "harpData"),
  file_template    = "fc{YYYY}{MM}{DD}{HH}.nc",
  file_format_opts = netcdf_opts(options_set = "met_norway_eps"),
  return_data      = TRUE
)

#> ::ensemble gridded forecast:: # A tibble: 5 × 14
#>   fcst_model fcst_dttm           lead_time parameter valid_dttm         
#>   <chr>      <dttm>                  <dbl> <chr>     <dttm>             
#> 1 AAEPS      2018-07-10 00:00:00         0 T2m       2018-07-10 00:00:00
#> 2 AAEPS      2018-07-10 00:00:00         3 T2m       2018-07-10 03:00:00
#> 3 AAEPS      2018-07-10 00:00:00         6 T2m       2018-07-10 06:00:00
#> 4 AAEPS      2018-07-10 00:00:00         9 T2m       2018-07-10 09:00:00
#> 5 AAEPS      2018-07-10 00:00:00        12 T2m       2018-07-10 12:00:00
#> # ℹ 9 more variables: level_type <chr>, level <dbl>, units <chr>,
#> #   fcst_cycle <chr>, AAEPS_mbr000 <geolist>, AAEPS_mbr001 <geolist>,
#> #   AAEPS_mbr002 <geolist>, AAEPS_mbr003 <geolist>, AAEPS_mbr004 <geolist>

You will see that there is a warning about missing lead times. This is because the default behaviour is to ask for lead times 0 - 48 hours, every three hours. The forecast data are in columns with the name of the forecast model with a suffix “_mbrXXX” where XXX is the member number. We could also ask for a limited set of lead times and ensemble members:

read_forecast(
  2018071000,
  "AAEPS",
  "T2m",
  lead_time        = seq(0, 6, 3),
  members          = c(0, 2, 4),
  file_path        = system.file("netcdf/AAEPS", package = "harpData"),
  file_template    = "fc{YYYY}{MM}{DD}{HH}.nc",
  file_format_opts = netcdf_opts(options_set = "met_norway_eps"),
  return_data      = TRUE
)

#> ::ensemble gridded forecast:: # A tibble: 3 × 12
#>   fcst_model fcst_dttm           lead_time parameter valid_dttm         
#>   <chr>      <dttm>                  <dbl> <chr>     <dttm>             
#> 1 AAEPS      2018-07-10 00:00:00         0 T2m       2018-07-10 00:00:00
#> 2 AAEPS      2018-07-10 00:00:00         3 T2m       2018-07-10 03:00:00
#> 3 AAEPS      2018-07-10 00:00:00         6 T2m       2018-07-10 06:00:00
#> # ℹ 7 more variables: level_type <chr>, level <dbl>, units <chr>,
#> #   fcst_cycle <chr>, AAEPS_mbr000 <geolist>, AAEPS_mbr002 <geolist>,
#> #   AAEPS_mbr004 <geolist>

Lagged ensembles

For lagged ensembles, data are available in harpData for the CMEPS model in vfld format. vfld files are text files with gridded data already interpolated to points. The CMEPS ensemble is quite complex with a number of members available each hour, but tied to a control member every three hours. harpData has members 0, 1, 3, 4, 5 and 6 where members 0 and 1 are run at hours 0, 3, 6, …, 21; members 3 and 4 at hours 1, 4, 7, …, 22; and members 5 and 6 at hours 2, 5, 8, …, 23. This means that members 3 and 4 are two hours behind members 0 and 1, and members 5 and 6 are one hour behind members 0 and 1. The lags argument is used to pass the lagging information to read_forecast and the lags aer assumed to be in the same order as the members argument. Note also that the dates for CMEPS data are different to the ones that have been used before.

read_forecast(
  seq_dttm(2019021700, 2019021718, "6h"),
  "CMEPS_prod",
  "T2m",
  lead_time     = seq(0, 12, 3),
  members       = c(0, 1, 3, 4, 5, 6),
  lags          = c(0, 0, 2, 2, 1, 1),
  file_path     = system.file("vfld", package = "harpData"),
  file_template = "vfld_eps",
  return_data   = TRUE 
)

#> ::ensemble point forecast:: # A tibble: 22,980 × 16
#>    fcst_model fcst_dttm           lead_time parameter   SID   lat   lon units
#>    <chr>      <dttm>                  <dbl> <chr>     <int> <dbl> <dbl> <chr>
#>  1 CMEPS_prod 2019-02-17 00:00:00         0 T2m        1001  70.9 -8.67 K    
#>  2 CMEPS_prod 2019-02-17 00:00:00         0 T2m        1010  69.3 16.1  K    
#>  3 CMEPS_prod 2019-02-17 00:00:00         0 T2m        1014  69.2 17.9  K    
#>  4 CMEPS_prod 2019-02-17 00:00:00         0 T2m        1015  69.6 17.8  K    
#>  5 CMEPS_prod 2019-02-17 00:00:00         0 T2m        1018  69.2 16.0  K    
#>  6 CMEPS_prod 2019-02-17 00:00:00         0 T2m        1023  69.1 18.5  K    
#>  7 CMEPS_prod 2019-02-17 00:00:00         0 T2m        1025  69.7 18.9  K    
#>  8 CMEPS_prod 2019-02-17 00:00:00         0 T2m        1026  69.7 18.9  K    
#>  9 CMEPS_prod 2019-02-17 00:00:00         0 T2m        1027  69.7 18.9  K    
#> 10 CMEPS_prod 2019-02-17 00:00:00         0 T2m        1033  70.2 19.5  K    
#> # ℹ 22,970 more rows
#> # ℹ 8 more variables: valid_dttm <dttm>, fcst_cycle <chr>,
#> #   CMEPS_prod_mbr000 <dbl>, CMEPS_prod_mbr001 <dbl>,
#> #   CMEPS_prod_mbr003_lag2h <dbl>, CMEPS_prod_mbr004_lag2h <dbl>,
#> #   CMEPS_prod_mbr005_lag1h <dbl>, CMEPS_prod_mbr006_lag1h <dbl>

You will see that a number of members have NA - this is because certain combinations of forecast time and lead time do not exist for those members. When data are in read in from SQLite files using read_point_forecast() this information is adjusted to take care of the lagged members. When working with point data, the most efficient workflow is to use read_forecast() to write the data to files and then read from the files.

Example data

Deterministic forecasts

Ensemble forecasts

Lagged ensembles

See also