Options for file formats • harpIO

For this vignette, we will use the {harpIO} and {dplyr} packages. Note that {dplyr} is only used for the dplyr::select() function to remove some columns from data frames so that they are not too wide for the vignette!

library(harpIO)
library(dplyr)

{harpIO} is able to read data from several different file formats. This is includes grib (both editions 1 and 2), netcdf, FA and vfld / vobs. In most cases, reading forecast data with the read_forecast() function is straightforward, but there are some issues that need to be taken into account for some formats. This vignette describes the extra options for grib and netcdf file formats.

Grib files

Grib files have long been used to store meteorological data. They are a binary data format and store “messages” as 2d fields. Each message is accompanied by metadata that describe the geographic projection of the data and the grid that they are on. This metadata also includes key-value pairs that describe what the data are - i.e. the parameter, what vertical level they are on, etc. {harpIO} harnesses the {Rgrib2} package to read data from grib files, making use of the key-value pairs to extract the correct 2d fields from the files. Behind the scenes, {Rgirb2} uses the ecCodes system library, which includes lookup tables for many different operational centres.

For built in harp parameters (see show_harp_parameters()), the extraction of grib messages is done based on the grib shortName, indicatorOfTypeOfLevel, level, dataDate, dataTime and stepRange keys. Additionally, for ensemble forecasts the perturbationNumber key is used when different ensemble members are in the same file. In some cases, the values used for these keys may not be compatible with the key-value pairs used in some grib files. For example, different forecast centres may use values for the shortName key that are not listed in the lookup tables that are included with ecCodes, or store fields on vertical levels that are not consistent with what {harpIO} expects. In order to address the grib_opts() function can be used to set the file_format_opts argument in calls to read_forecast() or read_grid() so that specific key-value pairs can be used to override those built in to {harpIO}.

In the following examples, we will use read_grid() to read gridded data from a grib file in the {harpData} package, so first we create the path to the file. We export this to a system variable so that

grib_file <- system.file(
  "grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp", 
  package = "harpData"
)

We can then use the system command, grib_ls from ecCodes to get a summary of the content of the grib file. The -p tag is used to just select the keys we might be interested in.

system(
  paste(
    "grib_ls",
    "-p shortName,indicatorOfParameter,typeOfLevel,level", 
    grib_file
  )
)

#> shortName             indicatorOfParameter  typeOfLevel           level                 
#> t                     11                    heightAboveGround     0                    
#> z                     6                     heightAboveGround     0                    
#> t                     11                    heightAboveGround     2                    
#> q                     51                    heightAboveGround     2                    
#> u                     33                    heightAboveGround     10                   
#> v                     34                    heightAboveGround     10                   
#> tcc                   71                    heightAboveGround     0                    
#> u                     33                    isobaricInhPa         50                   
#> u                     33                    isobaricInhPa         100                  
#> u                     33                    isobaricInhPa         150                  
#> u                     33                    isobaricInhPa         200                  
#> u                     33                    isobaricInhPa         250                  
#> u                     33                    isobaricInhPa         300                  
#> u                     33                    isobaricInhPa         400                  
#> u                     33                    isobaricInhPa         500                  
#> u                     33                    isobaricInhPa         600                  
#> u                     33                    isobaricInhPa         700                  
#> u                     33                    isobaricInhPa         800                  
#> u                     33                    isobaricInhPa         850                  
#> u                     33                    isobaricInhPa         900                  
#> u                     33                    isobaricInhPa         925                  
#> u                     33                    isobaricInhPa         950                  
#> u                     33                    isobaricInhPa         1000                 
#> v                     34                    isobaricInhPa         50                   
#> v                     34                    isobaricInhPa         100                  
#> v                     34                    isobaricInhPa         150                  
#> v                     34                    isobaricInhPa         200                  
#> v                     34                    isobaricInhPa         250                  
#> v                     34                    isobaricInhPa         300                  
#> v                     34                    isobaricInhPa         400                  
#> v                     34                    isobaricInhPa         500                  
#> v                     34                    isobaricInhPa         600                  
#> v                     34                    isobaricInhPa         700                  
#> v                     34                    isobaricInhPa         800                  
#> v                     34                    isobaricInhPa         850                  
#> v                     34                    isobaricInhPa         900                  
#> v                     34                    isobaricInhPa         925                  
#> v                     34                    isobaricInhPa         950                  
#> v                     34                    isobaricInhPa         1000                 
#> t                     11                    isobaricInhPa         50                   
#> t                     11                    isobaricInhPa         100                  
#> t                     11                    isobaricInhPa         150                  
#> t                     11                    isobaricInhPa         200                  
#> t                     11                    isobaricInhPa         250                  
#> t                     11                    isobaricInhPa         300                  
#> t                     11                    isobaricInhPa         400                  
#> t                     11                    isobaricInhPa         500                  
#> t                     11                    isobaricInhPa         600                  
#> t                     11                    isobaricInhPa         700                  
#> t                     11                    isobaricInhPa         800                  
#> t                     11                    isobaricInhPa         850                  
#> t                     11                    isobaricInhPa         900                  
#> t                     11                    isobaricInhPa         925                  
#> t                     11                    isobaricInhPa         950                  
#> t                     11                    isobaricInhPa         1000                 
#> q                     51                    isobaricInhPa         50                   
#> q                     51                    isobaricInhPa         100                  
#> q                     51                    isobaricInhPa         150                  
#> q                     51                    isobaricInhPa         200                  
#> q                     51                    isobaricInhPa         250                  
#> q                     51                    isobaricInhPa         300                  
#> q                     51                    isobaricInhPa         400                  
#> q                     51                    isobaricInhPa         500                  
#> q                     51                    isobaricInhPa         600                  
#> q                     51                    isobaricInhPa         700                  
#> q                     51                    isobaricInhPa         800                  
#> q                     51                    isobaricInhPa         850                  
#> q                     51                    isobaricInhPa         900                  
#> q                     51                    isobaricInhPa         925                  
#> q                     51                    isobaricInhPa         950                  
#> q                     51                    isobaricInhPa         1000                 
#> z                     6                     isobaricInhPa         50                   
#> z                     6                     isobaricInhPa         100                  
#> z                     6                     isobaricInhPa         150                  
#> z                     6                     isobaricInhPa         200                  
#> z                     6                     isobaricInhPa         250                  
#> z                     6                     isobaricInhPa         300                  
#> z                     6                     isobaricInhPa         400                  
#> z                     6                     isobaricInhPa         500                  
#> z                     6                     isobaricInhPa         600                  
#> z                     6                     isobaricInhPa         700                  
#> z                     6                     isobaricInhPa         800                  
#> z                     6                     isobaricInhPa         850                  
#> z                     6                     isobaricInhPa         900                  
#> z                     6                     isobaricInhPa         925                  
#> z                     6                     isobaricInhPa         950                  
#> z                     6                     isobaricInhPa         1000                 
#> pres                  1                     heightAboveSea        0                    
#> u                     33                    heightAboveGround     20                   
#> u                     33                    heightAboveGround     50                   
#> u                     33                    heightAboveGround     100                  
#> u                     33                    heightAboveGround     250                  
#> u                     33                    heightAboveGround     500                  
#> u                     33                    heightAboveGround     750                  
#> u                     33                    heightAboveGround     1000                 
#> u                     33                    heightAboveGround     1250                 
#> u                     33                    heightAboveGround     1500                 
#> u                     33                    heightAboveGround     2000                 
#> u                     33                    heightAboveGround     2500                 
#> u                     33                    heightAboveGround     3000                 
#> v                     34                    heightAboveGround     20                   
#> v                     34                    heightAboveGround     50                   
#> v                     34                    heightAboveGround     100                  
#> v                     34                    heightAboveGround     250                  
#> v                     34                    heightAboveGround     500                  
#> v                     34                    heightAboveGround     750                  
#> v                     34                    heightAboveGround     1000                 
#> v                     34                    heightAboveGround     1250                 
#> v                     34                    heightAboveGround     1500                 
#> v                     34                    heightAboveGround     2000                 
#> v                     34                    heightAboveGround     2500                 
#> v                     34                    heightAboveGround     3000                 
#> t                     11                    heightAboveGround     20                   
#> t                     11                    heightAboveGround     50                   
#> t                     11                    heightAboveGround     100                  
#> t                     11                    heightAboveGround     250                  
#> t                     11                    heightAboveGround     500                  
#> t                     11                    heightAboveGround     750                  
#> t                     11                    heightAboveGround     1000                 
#> t                     11                    heightAboveGround     1250                 
#> t                     11                    heightAboveGround     1500                 
#> t                     11                    heightAboveGround     2000                 
#> t                     11                    heightAboveGround     2500                 
#> t                     11                    heightAboveGround     3000                 
#> pres                  1                     heightAboveGround     20                   
#> pres                  1                     heightAboveGround     50                   
#> pres                  1                     heightAboveGround     100                  
#> pres                  1                     heightAboveGround     250                  
#> pres                  1                     heightAboveGround     500                  
#> pres                  1                     heightAboveGround     750                  
#> pres                  1                     heightAboveGround     1000                 
#> pres                  1                     heightAboveGround     1250                 
#> pres                  1                     heightAboveGround     1500                 
#> pres                  1                     heightAboveGround     2000                 
#> pres                  1                     heightAboveGround     2500                 
#> pres                  1                     heightAboveGround     3000                 
#> pres                  1                     heightAboveGround     0                    
#> u                     33                    potentialVorticity    1500                 
#> u                     33                    potentialVorticity    2000                 
#> v                     34                    potentialVorticity    1500                 
#> v                     34                    potentialVorticity    2000                 
#> z                     6                     potentialVorticity    1500                 
#> z                     6                     potentialVorticity    2000                 
#> t                     11                    hybrid                65                   
#> ws                    32                    heightAboveGround     10                   
#> tp                    61                    heightAboveGround     0

Setting the shortName

If we take for example the grib message with shortName “tp”, we can read this without problem using the harp parameter “Pcp”:

read_grid(grib_file, "Pcp")

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp

#> eidb : TP Total precipitation kg m**-2 
#> Time:
#>  2018/07/10 00:00
#> Domain summary:
#> 100 x 100 domain
#> Projection summary:
#> proj= lcc 
#> NE = ( 23.31109 , 78.2607 )
#> SW = ( 8.042 , 77.968 )
#> Data summary:
#> 0 0.2287102 0.5398808 1.363777 1.41673 32.08449

However, what if for some reason we wanted to use “Precip” instead of “Pcp” for the parameter name. Then read_grid() will not know which shortName to search for, so will attempt to find a message with shortName = “precip”, and won’t be able to find anything. Note it also uses levelType = 255 and level = -999. This means that if it does find messages with with shortName = “precip”, it will ignore the type of level and read messages for all levels.

read_grid(grib_file, "Precip")

#> Warning: Parameter "Precip" (shortName: "Precip", typeOfLevel: "unknown" for
#> level(s) -999) not found in grib file.

#> Error: None of the requested data could be read from grib file: /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp

We can then use grib_opts() with the argument param_find to force it to look for the shortName “tp” instead. param_find must be a named list with the name being the same as the parameter that is requested, and the elements can be set using a use_grib_*() function, which are listed in the help for grib_opts(). To set the grib shortName to look for, we use the use_grib_shortName() function.

read_grid(
  grib_file,
  "Precip",
  file_format_opts = grib_opts(
    param_find = list(Precip = use_grib_shortName("tp"))
  )
)

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp

#> eidb : TP Total precipitation kg m**-2 
#> Time:
#>  2018/07/10 00:00
#> Domain summary:
#> 100 x 100 domain
#> Projection summary:
#> proj= lcc 
#> NE = ( 23.31109 , 78.2607 )
#> SW = ( 8.042 , 77.968 )
#> Data summary:
#> 0 0.2287102 0.5398808 1.363777 1.41673 32.08449

We can also read more than one variable in this way - for example, let’s say we want to read 2m temperature, precipitation and total cloud cover - we will stick with the harp name, “T2m”, for 2m temperature but use different names for precipitation (Precip, shortName = “tp”) and total cloud cover (Cloud, shortName = “tcc”).

read_grid(
  grib_file,
  c("Cloud", "T2m", "Precip"),
  file_format_opts = grib_opts(
    param_find = list(
      Precip = use_grib_shortName("tp"),
      Cloud  = use_grib_shortName("tcc")
    )
  )
)

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp

#> <harp_geolist[3]>
#> [[1]] <numeric geofield [100 x 100] Min = 0.000 Max = 1.000 Mean = 0.968>
#> [[2]] <numeric geofield [100 x 100] Min = 0.000 Max = 32.084 Mean = 1.364>
#> [[3]] <numeric geofield [100 x 100] Min = 270.902 Max = 279.981 Mean = 276.333>

And if we set the output of read_grid() to be a data frame, we can see that the modified parameter names have been used.

read_grid(
  grib_file,
  c("Cloud", "T2m", "Precip"),
  file_format_opts = grib_opts(
    param_find = list(
      Precip = use_grib_shortName("tp"),
      Cloud  = use_grib_shortName("tcc")
    )
  ), 
  data_frame = TRUE
) %>% 
  select(-ends_with("dttm"), -members)

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp

#> # A tibble: 3 × 7
#>   lead_time step_range parameter level_type level units    gridded_data
#>       <dbl> <chr>      <chr>     <chr>      <int> <chr>       <geolist>
#> 1         6 6          Cloud     height         0 fraction  [100 × 100]
#> 2         6 0-6        Precip    height         0 kg/m^2    [100 × 100]
#> 3         6 6          T2m       height         2 K         [100 × 100]

Dealing with an unknown shortName

In some cases the grib lookup table that is used may be a local version that is not included with ecCodes. In this case you would need to use another grib key to locate the correct message. Typically this would be indicatorOfParameter for grib edition 1, or paramId for grib edition 2. Looking back to the output of grib_ls above, we can see that the message for total precipitation (shortName = “tp”) has indicatorOfParameter = 61, and that for 10m wind speed (shortName = “ws”) has indicatorOfParameter = 32. We can therefore use these to extract the data from the file by using use_grib_indicatorOfParameter.

read_grid(
  grib_file,
  c("total_precipitation", "wind_speed"),
  file_format_opts = grib_opts(
    param_find = list(
      total_precipitation = use_grib_indicatorOfParameter(61),
      wind_speed          = use_grib_indicatorOfParameter(32)
    )
  )
)

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp

#> <harp_geolist[2]>
#> [[1]] <numeric geofield [100 x 100] Min = 0.000 Max = 32.084 Mean = 1.364>
#> [[2]] <numeric geofield [100 x 100] Min = 0.801 Max = 21.458 Mean = 9.962>

In practical situations, where the grib shortName is not known, it is the user’s responsibility to know which indicatorOfParameter or paramId to use.

Reading data from specific level types

In our example data file, some data exist on potential vorticity levels. {harpIO} does not know how to read these data, but we can use the level_find argument of grib_opts() to select the correct type of level. level_find works in the same way as param_find in that it requires a named list and a use_grib_*() function to set the correct grib keys. Here, the typeOfLevel key has a value of “potentialVorticity”, so we can read the geopotential (Z), wind speed in the x direction of the model grid (U), and wind speed in the y direction of the model grid (V) by setting the typeOfLevel key to “potentialVorticity”. Z, U, V are parameters that {harpIO} knows, so we don’t need to worry about param_find.

read_grid(
  grib_file,
  c("U", "V", "Z"),
  file_format_opts = grib_opts(
    level_find = list(
      U = use_grib_typeOfLevel("potentialVorticity"),
      V = use_grib_typeOfLevel("potentialVorticity"),
      Z = use_grib_typeOfLevel("potentialVorticity")
    )
  ), 
  data_frame = TRUE
) %>% 
  select(-ends_with("dttm"), -members)

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp

#> # A tibble: 6 × 7
#>   lead_time step_range parameter level_type level units   gridded_data
#>       <dbl> <chr>      <chr>     <chr>      <int> <chr>      <geolist>
#> 1         6 6          U         unknown     1500 m/s      [100 × 100]
#> 2         6 6          U         unknown     2000 m/s      [100 × 100]
#> 3         6 6          V         unknown     1500 m/s      [100 × 100]
#> 4         6 6          V         unknown     2000 m/s      [100 × 100]
#> 5         6 6          Z         unknown     1500 m^2/s^2  [100 × 100]
#> 6         6 6          Z         unknown     2000 m^2/s^2  [100 × 100]

By default, all levels are read, but we can use the second argument of use_grib_typeOfLevel to set the levels to read:

read_grid(
  grib_file,
  c("U", "V", "Z"),
  file_format_opts = grib_opts(
    level_find = list(
      U = use_grib_typeOfLevel("potentialVorticity", 1500),
      V = use_grib_typeOfLevel("potentialVorticity", 3000),
      Z = use_grib_typeOfLevel("potentialVorticity", c(2000, 3000))
    )
  ), 
  data_frame = TRUE
) %>% 
  select(-ends_with("dttm"), -members)

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp

#> Warning: Parameter "V" (shortName: "v", typeOfLevel: "potentialVorticity" for
#> level(s) 3000) not found in grib file.

#> # A tibble: 2 × 7
#>   lead_time step_range parameter level_type level units   gridded_data
#>       <dbl> <chr>      <chr>     <chr>      <int> <chr>      <geolist>
#> 1         6 6          U         unknown     1500 m/s      [100 × 100]
#> 2         6 6          Z         unknown     2000 m^2/s^2  [100 × 100]

Note that currently a warning is only given if no levels are found. If more than one level is asked for, but at least one is found the function is silent about the missing levels.

Reading specific levels

Currently read_grid() and read_forecast() can either read one or all vertical levels from files. To read a small selection of levels from a grib file, the level_find argument to grib_opts() can be used conjunction with use_grib_pressure(), use_grib_model() and use_grib_heightAboveGround() to read from specific pressure, model (hybrid), or height levels. Our example data contain data on a range of pressure levels, so if we want, for example, temperature at 850-, 500- and 200 hPa, we would do the following:

read_grid(
  grib_file, 
  "T",
  file_format_opts = grib_opts(
    level_find = list(
      T = use_grib_pressure(c(850, 500, 200))
    )
  ),
  data_frame = TRUE
) %>% 
  select(-ends_with("date"), -members)

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp

#> # A tibble: 3 × 9
#>   fcst_dttm           valid_dttm          lead_time step_range parameter
#>   <dttm>              <dttm>                  <dbl> <chr>      <chr>    
#> 1 2018-07-10 00:00:00 2018-07-10 06:00:00         6 6          T        
#> 2 2018-07-10 00:00:00 2018-07-10 06:00:00         6 6          T        
#> 3 2018-07-10 00:00:00 2018-07-10 06:00:00         6 6          T        
#> # ℹ 4 more variables: level_type <chr>, level <int>, units <chr>,
#> #   gridded_data <geolist>

Similarly, to read temperature at 500-, 1000- and 1500m above the ground we would do the following:

read_grid(
  grib_file, 
  "T",
  file_format_opts = grib_opts(
    level_find = list(
      T = use_grib_heightAboveGround(c(500, 1000, 1500))
    )
  ),
  data_frame = TRUE
) %>% 
  select(-ends_with("date"), -members)

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp

#> # A tibble: 3 × 9
#>   fcst_dttm           valid_dttm          lead_time step_range parameter
#>   <dttm>              <dttm>                  <dbl> <chr>      <chr>    
#> 1 2018-07-10 00:00:00 2018-07-10 06:00:00         6 6          T        
#> 2 2018-07-10 00:00:00 2018-07-10 06:00:00         6 6          T        
#> 3 2018-07-10 00:00:00 2018-07-10 06:00:00         6 6          T        
#> # ℹ 4 more variables: level_type <chr>, level <int>, units <chr>,
#> #   gridded_data <geolist>

It should be emphasised that no interpolation is done. i.e. you can only use level_find to select vertical levels that exist in the file.

Netcdf files

Netcdf files are becoming increasingly used in meteorology and climatology. Each variable in a netcdf file is stored as an array that can be subsetted. Each array in the file has attributes and dimensions that describe the variable. In order to read data from netcdf files using {harpIO} it is important that you know something about the contents of the files so that the correct variables can be read and the correct subsets taken. This is done by passing a list generated by the netcdf_opts() function to read_forecast() or read_grid() via the file_format_opts argument. netcdf_opts() generates a list that has names for the dimensions of an array and either where to find the geographic projection information in the file, or the projection information as a proj4 string, and whether any of the dimensions are reversed.

We can use a netcdf file from the {harpData} package to show how netcdf_opts() works.

nc_file <- system.file(
  "netcdf/AAEPS/fc2018071000.nc", 
  package = "harpData"
)

Then we can open a connection to the file using the {ncdf4} package to get a summary of the contents of the file.

library(ncdf4)
nc <- nc_open(nc_file)
nc

#> File /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc (NC_FORMAT_NETCDF4_CLASSIC):
#> 
#>      13 variables (excluding dimension variables):
#>         double forecast_reference_time[]   (Contiguous storage)  
#>             units: seconds since 1970-01-01 00:00:00 +00:00
#>             standard_name: forecast_reference_time
#>         double p0[]   (Contiguous storage)  
#>             units: Pa
#>         double ap[hybrid]   (Chunking: [65])  (Compression: shuffle,level 3)
#>             units: Pa
#>         double b[hybrid]   (Chunking: [65])  (Compression: shuffle,level 3)
#>             units: 1
#>         int projection_lambert[]   (Contiguous storage)  
#>             grid_mapping_name: lambert_conformal_conic
#>             standard_parallel: 77.5
#>              standard_parallel: 77.5
#>             longitude_of_central_meridian: -25
#>             latitude_of_projection_origin: 77.5
#>             earth_radius: 6371000
#>             proj4: +proj=lcc +lat_0=77.5 +lon_0=-25 +lat_1=77.5 +lat_2=77.5 +no_defs +R=6.371e+06
#>         double longitude[x,y]   (Chunking: [100,100])  (Compression: shuffle,level 3)
#>             units: degree_east
#>             long_name: longitude
#>             standard_name: longitude
#>         double latitude[x,y]   (Chunking: [100,100])  (Compression: shuffle,level 3)
#>             units: degree_north
#>             long_name: latitude
#>             standard_name: latitude
#>         int specific_humidity_ml[x,y,ensemble_member,hybrid,time]   (Chunking: [100,100,5,24,1])  (Compression: shuffle,level 3)
#>             _FillValue: -2147483647
#>             long_name: Specific humidity model levels
#>             standard_name: specific_humidity
#>             units: kg/kg
#>             grid_mapping: projection_lambert
#>             coordinates: longitude latitude
#>             scale_factor: 1e-07
#>         float air_temperature_2m[x,y,ensemble_member,height1,time]   (Chunking: [100,100,5,1,1])  (Compression: shuffle,level 3)
#>             _FillValue: 9.96920996838687e+36
#>             long_name: Screen level temperature (T2M)
#>             standard_name: air_temperature
#>             units: K
#>             grid_mapping: projection_lambert
#>             coordinates: longitude latitude
#>         float specific_humidity_2m[x,y,ensemble_member,height1,time]   (Chunking: [100,100,5,1,1])  (Compression: shuffle,level 3)
#>             _FillValue: 9.96920996838687e+36
#>             long_name: Screen level specific humidity
#>             standard_name: specific_humidity
#>             units: kg/kg
#>             grid_mapping: projection_lambert
#>             coordinates: longitude latitude
#>         int air_temperature_ml[x,y,ensemble_member,hybrid,time]   (Chunking: [100,100,5,24,1])  (Compression: shuffle,level 3)
#>             _FillValue: -2147483647
#>             long_name: Air temperature model levels
#>             standard_name: air_temperature
#>             units: K
#>             grid_mapping: projection_lambert
#>             coordinates: longitude latitude
#>             scale_factor: 0.001
#>         float surface_air_pressure[x,y,ensemble_member,height0,time]   (Chunking: [100,100,5,1,1])  (Compression: shuffle,level 3)
#>             _FillValue: 9.96920996838687e+36
#>             long_name: Surface air pressure
#>             standard_name: surface_air_pressure
#>             units: Pa
#>             grid_mapping: projection_lambert
#>             coordinates: longitude latitude
#>         float surface_geopotential[x,y,ensemble_member,height0,time]   (Chunking: [100,100,5,1,1])  (Compression: shuffle,level 3)
#>             _FillValue: 9.96920996838687e+36
#>             long_name: Surface geopotential (fis)
#>             standard_name: surface_geopotential
#>             units: m^2/s^2
#>             grid_mapping: projection_lambert
#>             coordinates: longitude latitude
#> 
#>      7 dimensions:
#>         time  Size:5   *** is unlimited *** 
#>             long_name: time
#>             standard_name: time
#>             units: seconds since 1970-01-01 00:00:00 +00:00
#>         ensemble_member  Size:5 
#>             long_name: ensemble run number
#>             standard_name: realization
#>             flag_values: 0
#>              flag_values: 1
#>              flag_values: 2
#>              flag_values: 3
#>              flag_values: 4
#>              flag_values: 5
#>              flag_values: 6
#>              flag_values: 7
#>              flag_values: 8
#>              flag_values: 9
#>              flag_values: 10
#>             flag_meanings: mbr000 mbr001 mbr002 mbr003 mbr004 mbr005 mbr006 mbr007 mbr008 mbr009 mbr010
#>         height0  Size:1 
#>             description: height above ground
#>             long_name: height
#>             positive: up
#>             units: m
#>         height1  Size:1 
#>             description: height above ground
#>             long_name: height
#>             positive: up
#>             units: m
#>         hybrid  Size:65 
#>             standard_name: atmosphere_hybrid_sigma_pressure_coordinate
#>             formula: p(n,k,j,i) = ap(k) + b(k)*ps(n,j,i)
#>             formula_terms: ap: ap b: b ps: surface_air_pressure p0: p0
#>             long_name: atmosphere_hybrid_sigma_pressure_coordinate
#>             positive: down
#>         x  Size:100 
#>             long_name: x-coordinate in Cartesian system
#>             standard_name: projection_x_coordinate
#>             units: m
#>         y  Size:100 
#>             long_name: y-coordinate in Cartesian system
#>             standard_name: projection_y_coordinate
#>             units: m
#> 
#>     16 global attributes:
#>         Conventions: CF-1.6
#>         institution: Norwegian Meteorological Institute, MET Norway
#>         creator_url: met.no
#>         summary: MEPS (MetCoOp-Ensemble Prediction System) is a convection-permitting atmosphere ensemble model covering Scandinavia and the Nordic Seas. It has horizontal resolution of 2.5 km, has 65 vertical levels and 10 members. MEPS is ran four times daily (00,06,12,18) with three-hourly cycling for data assimilation. Member 0 and 1 are ran up to 66 hours, the rest up to 48 hours. Boundary data is from ECMWF, and initial perturbations are based on the SLAF method. Model code based on HARMONIE cy40h1.1
#>         source: MEPS 2.5km
#>         title: MEPS 2.5km
#>         min_time: 2018-07-10 00:00:00Z
#>         max_time: 2018-07-12
#>         geospatial_lat_min: 51.0
#>         geospatial_lat_max: 88.0
#>         geospatial_lon_min: -20.0
#>         geospatial_lon_max: 80.0
#>         references: unknown
#>         license: https://www.met.no/en/free-meteorological-data/Licensing-and-crediting
#>         comment: none
#>         history: 2019-07-17 creation by fimex

From the above you see the variables, their attributes, the dimensions of the variables and some global attributes. We can use this information to set the correct information using netcdf_opts() when we want to read data using read_forecast(), or read_grid(). For these examples we will use read_grid(), but the same principles apply for read_forecast().

It is important to close the connection to a netcdf file when we are finished with it (this is taken care of for us when we use read_forecast() and read_grid()).

nc_close(nc)

Let’s take a look at some of the default values for netcdf_opts()

netcdf_opts()

#> $options_set
#> [1] "none"
#> 
#> $param_find
#> list()
#> 
#> $proj4_var
#> [1] "projection_lambert"
#> 
#> $proj4_att
#> [1] "proj4"
#> 
#> $proj4
#> NULL
#> 
#> $x_dim
#> [1] "x"
#> 
#> $y_dim
#> [1] "y"
#> 
#> $lon_var
#> [1] "longitude"
#> 
#> $lat_var
#> [1] "latitude"
#> 
#> $x_rev
#> [1] FALSE
#> 
#> $y_rev
#> [1] FALSE
#> 
#> $dx
#> NULL
#> 
#> $dy
#> NULL
#> 
#> $z_var
#> [1] NA
#> 
#> $member_var
#> [1] NA
#> 
#> $time_var
#> [1] "time"
#> 
#> $ref_time_var
#> [1] NA
#> 
#> $force_param_name
#> [1] FALSE

Projection information

There are 3 elements in the list that begin with proj4_. These are to control where the projection information is obtained.

proj4_var is the name of the variable in the netcdf file that holds the proj4 string.
proj4_att is the name of the attribute of proj4_var that has the proj4 string. If set to NULL, an attempt is made to read the proj4 string directly from proj4_var
proj4 is where you can set the proj4 string directly rather than attempt to read it from the file. If proj4 is not NULL, proj4_var and proj4_att are ignored.

In the summary of our netcdf file above, there is a variable projection_lambert that has an attribute proj4, which contains the proj4 string “+proj=lcc +lat_0=77.5 +lon_0=-25 +lat_1=77.5 +lat_2=77.5 +no_defs +R=6.371e+06”, so the default values we get from netcdf_opts() are correct for this file.

Dimension names

The dimension names of variables are set using x_dim, y_dim, z_var, member_var and time_var. The reason we need to know the names are that arrays in netcdf file can be stored in any dimension order, and we may wish to read only a subset of the array so we need to know which parts of the dimensions to take. By default, netcdf_opts() gives us:

x_dim      = "x"
y_dim      = "y"
time_var   = "time"
z_var      = NA
member_var = NA

Since z_var and member_var are NA, the default means that it is expecting to find variables in the netcdf file with 3 dimensions: “x”, “y” and “time”. If we go back to our example above, and let’s say we want to read 2m temperature (“air_temperature_2m”, or the harp parameter “T2m”), we see that air_temperature_2m has 5 dimensions: x, y, ensemble_member, height1 and time. We therefore need to modify the values of z_var and member_var using netcdf_opts():

my_nc_opts <- netcdf_opts(z_var = "height1", member_var = "ensemble_member")

Latitude, longitude and time

This leaves lon_var, lat_var and ref_time_var. If the file contains arrays of longitude and latitude, these are used to geographically orientate the data - if they are set to NULL an attempt is made to do this using x_dim and y_dim and the projection information. ref_time_var is used in forecasts to calculate the lead time. If set to NA, the lead time is calculated by subtracting the first element of the time dimension from each subsequent element.

We can now attempt to read some data - firstly without using netcdf_opts():

read_grid(nc_file, "T2m", data_frame = TRUE)

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc

#> # A tibble: 25 × 9
#>    fcst_dttm           valid_dttm          lead_time parameter level_type level
#>    <dttm>              <dttm>                  <dbl> <chr>     <chr>      <dbl>
#>  1 2018-07-10 00:00:00 2018-07-10 00:00:00         0 T2m       height         2
#>  2 2018-07-10 00:00:00 2018-07-10 00:00:00         0 T2m       height         2
#>  3 2018-07-10 00:00:00 2018-07-10 00:00:00         0 T2m       height         2
#>  4 2018-07-10 00:00:00 2018-07-10 00:00:00         0 T2m       height         2
#>  5 2018-07-10 00:00:00 2018-07-10 00:00:00         0 T2m       height         2
#>  6 2018-07-10 00:00:00 2018-07-10 03:00:00         3 T2m       height         2
#>  7 2018-07-10 00:00:00 2018-07-10 03:00:00         3 T2m       height         2
#>  8 2018-07-10 00:00:00 2018-07-10 03:00:00         3 T2m       height         2
#>  9 2018-07-10 00:00:00 2018-07-10 03:00:00         3 T2m       height         2
#> 10 2018-07-10 00:00:00 2018-07-10 03:00:00         3 T2m       height         2
#> # ℹ 15 more rows
#> # ℹ 3 more variables: units <chr>, gridded_data <geolist>, members <int[1d]>

The error tells you what dimension names you have given - in this case (time, x, y) - the defaults from netcdf_opts(), and what dimensions it found in the file: (ensemble_member, height1, time, x, y). Note that in the warning the dimensions are listed in alphabetical order, it us up to you to decide which dimension they should go with.

So, now if we use our modified options to read the file, we should get a result:

read_grid(nc_file, "T2m", file_format_opts = my_nc_opts, data_frame = TRUE) %>% 
  select(-ends_with("dttm"))

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc

#> # A tibble: 25 × 7
#>    lead_time parameter level_type level units gridded_data   members
#>        <dbl> <chr>     <chr>      <dbl> <chr>    <geolist> <int[1d]>
#>  1         0 T2m       height         2 K      [100 × 100]         0
#>  2         0 T2m       height         2 K      [100 × 100]         1
#>  3         0 T2m       height         2 K      [100 × 100]         2
#>  4         0 T2m       height         2 K      [100 × 100]         3
#>  5         0 T2m       height         2 K      [100 × 100]         4
#>  6         3 T2m       height         2 K      [100 × 100]         0
#>  7         3 T2m       height         2 K      [100 × 100]         1
#>  8         3 T2m       height         2 K      [100 × 100]         2
#>  9         3 T2m       height         2 K      [100 × 100]         3
#> 10         3 T2m       height         2 K      [100 × 100]         4
#> # ℹ 15 more rows

We can also select specific members and lead times:

read_grid(
  nc_file, 
  "T2m",
  lead_time        = c(0,3),
  members          = c(0, 2, 4), 
  file_format_opts = my_nc_opts, 
  data_frame       = TRUE
) %>% 
  select(-ends_with("dttm"))

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc

#> # A tibble: 6 × 7
#>   lead_time parameter level_type level units gridded_data   members
#>       <dbl> <chr>     <chr>      <dbl> <chr>    <geolist> <int[1d]>
#> 1         0 T2m       height         2 K      [100 × 100]         0
#> 2         0 T2m       height         2 K      [100 × 100]         2
#> 3         0 T2m       height         2 K      [100 × 100]         4
#> 4         3 T2m       height         2 K      [100 × 100]         0
#> 5         3 T2m       height         2 K      [100 × 100]         2
#> 6         3 T2m       height         2 K      [100 × 100]         4

Options sets

Finally, netcdf_opts() has a number of default sets of options that can be set via the options_set argument. These set the options to known values for some data sources - currently operational NWP forecasts from MET Norway and WRF (though this isn’t fully tested). These options are:

“met_norway_eps” - EPS forecasts produced by MET Norway,
“met_norway_det” - deterministic forecasts produced by MET Norway
“met_norway_ifsens” - IFSENS forecasts from ECMWF archived at MET Norway
“met_norway_ifshires” - IFSHIRES forecasts from ECMWF archived at MET Norway
“wrf” - WRF forecasts on the WRF “mass” grid
“wrf_u_stagger” - WRF forecasts on the U-staggered grid
“wrf_v_stagger” - WRF forecasts on the V-staggered grid

You can, for example, use the “met_norway_det” options set when reading data from the MET Norway thredds server:

read_grid(
  "https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T21Z.nc", 
  "T2m",
  lead_time        = 6,
  file_format      = "netcdf",
  file_format_opts = netcdf_opts("met_norway_det")
)

#> Reading https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T21Z.nc

#>  : T2m K 
#> Time:
#>  2021/03/01 21:00
#> Domain summary:
#> 739 x 949 domain
#> Projection summary:
#> proj= lcc 
#> NE = ( 68.83323 , 71.10861 )
#> SW = ( -17.957 , 69.299 )
#> Data summary:
#> 241.2239 257.6566 267.764 264.93 272.1761 279.136

Note that to download data from an external source we need to explicitly tell read_grid() (or read_forecast()) that the file format is “netcdf”. This is because when the file is on disk an initial scan of the file is done to try and guess the format, but this is not possible with external files. For demonstration purposes we could do the same using read_forecast()

url      <- "https://thredds.met.no/thredds/dodsC/aromearcticarchive"
template <- paste0(
  "{YYYY}/{MM}/{DD}/",
  "{fcst_model}_extracted_2_5km_{YYYY}{MM}{DD}T{HH}Z.nc"
)
read_forecast(
  seq_dttm(2021030100, 2021030118, "6h"),
  fcst_model       = "arome_arctic",
  parameter        = "T2m",
  by               = "6h",
  lead_time        = seq(0, 3),
  file_path        = url,
  file_template    = template,
  file_format      = "netcdf",
  file_format_opts = netcdf_opts("met_norway_det"),
  return_data      = TRUE
) %>% 
  select(-starts_with("level"), -fcst_cycle ,-valid_dttm)

#> Reading https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T00Z.nc

#> Reading https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T06Z.nc

#> Reading https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T12Z.nc

#> Reading https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T18Z.nc

#> ::deterministic gridded forecast:: # A tibble: 16 × 6
#>    fcst_model   fcst_dttm           lead_time parameter units        fcst
#>    <chr>        <dttm>                  <dbl> <chr>     <chr>   <geolist>
#>  1 arome_arctic 2021-03-01 00:00:00         0 T2m       K     [739 × 949]
#>  2 arome_arctic 2021-03-01 00:00:00         1 T2m       K     [739 × 949]
#>  3 arome_arctic 2021-03-01 00:00:00         2 T2m       K     [739 × 949]
#>  4 arome_arctic 2021-03-01 00:00:00         3 T2m       K     [739 × 949]
#>  5 arome_arctic 2021-03-01 06:00:00         0 T2m       K     [739 × 949]
#>  6 arome_arctic 2021-03-01 06:00:00         1 T2m       K     [739 × 949]
#>  7 arome_arctic 2021-03-01 06:00:00         2 T2m       K     [739 × 949]
#>  8 arome_arctic 2021-03-01 06:00:00         3 T2m       K     [739 × 949]
#>  9 arome_arctic 2021-03-01 12:00:00         0 T2m       K     [739 × 949]
#> 10 arome_arctic 2021-03-01 12:00:00         1 T2m       K     [739 × 949]
#> 11 arome_arctic 2021-03-01 12:00:00         2 T2m       K     [739 × 949]
#> 12 arome_arctic 2021-03-01 12:00:00         3 T2m       K     [739 × 949]
#> 13 arome_arctic 2021-03-01 18:00:00         0 T2m       K     [739 × 949]
#> 14 arome_arctic 2021-03-01 18:00:00         1 T2m       K     [739 × 949]
#> 15 arome_arctic 2021-03-01 18:00:00         2 T2m       K     [739 × 949]
#> 16 arome_arctic 2021-03-01 18:00:00         3 T2m       K     [739 × 949]

The z_var problem

Netcdf files typically contain many variables, and for meteorological data they can have a range of vertical coordinates - e.g. height, pressure, model levels etc. So although we have set our options with netcdf_opts() we don’t want to set everything for every variable we wish to read from the file. To address this, the function modify_opts() can be used to change one or more options. If we try to read “specific_humidity_ml” from our example file with the netcdf options we currently have, we will get an error as z_var = “height1”, but for this variable it needs to be “hybrid”:

read_grid(
  nc_file, 
  "specific_humidity_ml", 
  lead_time        = 3,
  file_format_opts = my_nc_opts, 
  data_frame       = TRUE
)

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc

#> # A tibble: 325 × 9
#>    fcst_dttm           valid_dttm          lead_time parameter        level_type
#>    <dttm>              <dttm>                  <dbl> <chr>            <chr>     
#>  1 2018-07-10 00:00:00 2018-07-10 03:00:00         3 specific_humidi… pressure  
#>  2 2018-07-10 00:00:00 2018-07-10 03:00:00         3 specific_humidi… pressure  
#>  3 2018-07-10 00:00:00 2018-07-10 03:00:00         3 specific_humidi… pressure  
#>  4 2018-07-10 00:00:00 2018-07-10 03:00:00         3 specific_humidi… pressure  
#>  5 2018-07-10 00:00:00 2018-07-10 03:00:00         3 specific_humidi… pressure  
#>  6 2018-07-10 00:00:00 2018-07-10 03:00:00         3 specific_humidi… pressure  
#>  7 2018-07-10 00:00:00 2018-07-10 03:00:00         3 specific_humidi… pressure  
#>  8 2018-07-10 00:00:00 2018-07-10 03:00:00         3 specific_humidi… pressure  
#>  9 2018-07-10 00:00:00 2018-07-10 03:00:00         3 specific_humidi… pressure  
#> 10 2018-07-10 00:00:00 2018-07-10 03:00:00         3 specific_humidi… pressure  
#> # ℹ 315 more rows
#> # ℹ 4 more variables: level <dbl>, units <chr>, gridded_data <geolist>,
#> #   members <int[1d]>

So, to change z_var to “hybrid” we use modify_opts():

read_grid(
  nc_file, 
  "specific_humidity_ml",
  lead_time            = 3,
  file_format_opts     = modify_opts(my_nc_opts, z_var = "hybrid"),
  data_frame           = TRUE,
  vertical_coordinate = "model"
) %>% 
  select(-ends_with("dttm"))

#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc

#> # A tibble: 325 × 7
#>    lead_time parameter            level_type   level units gridded_data  members
#>        <dbl> <chr>                <chr>        <dbl> <chr>    <geolist> <int[1d>
#>  1         3 specific_humidity_ml model      0.00987 kg/kg  [100 × 100]        0
#>  2         3 specific_humidity_ml model      0.00987 kg/kg  [100 × 100]        1
#>  3         3 specific_humidity_ml model      0.00987 kg/kg  [100 × 100]        2
#>  4         3 specific_humidity_ml model      0.00987 kg/kg  [100 × 100]        3
#>  5         3 specific_humidity_ml model      0.00987 kg/kg  [100 × 100]        4
#>  6         3 specific_humidity_ml model      0.0296  kg/kg  [100 × 100]        0
#>  7         3 specific_humidity_ml model      0.0296  kg/kg  [100 × 100]        1
#>  8         3 specific_humidity_ml model      0.0296  kg/kg  [100 × 100]        2
#>  9         3 specific_humidity_ml model      0.0296  kg/kg  [100 × 100]        3
#> 10         3 specific_humidity_ml model      0.0296  kg/kg  [100 × 100]        4
#> # ℹ 315 more rows

We also have to pass the vertical_coordinate = "model" to get the correct name in the level_type column in the data frame of the output.

Note that for variables with a singleton z dimension, any name can be given for z_var as long as it exists as a dimension in the file. Furthermore, if there are many different dimensions with names such as “height0”, “height1”, “height2” etc. the number doesn’t have to match. This means that you do not have to modify z_var so much.