For this vignette, we will use the {harpIO} and {dplyr} packages.
Note that {dplyr} is only used for the dplyr::select()
function to remove some columns from data frames so that they are not
too wide for the vignette!
{harpIO} is able to read data from several different file formats.
This is includes grib (both editions 1 and 2), netcdf, FA and vfld /
vobs. In most cases, reading forecast data with the
read_forecast()
function is straightforward, but there are
some issues that need to be taken into account for some formats. This
vignette describes the extra options for grib and netcdf file
formats.
Grib files
Grib files have long been used to store meteorological data. They are a binary data format and store “messages” as 2d fields. Each message is accompanied by metadata that describe the geographic projection of the data and the grid that they are on. This metadata also includes key-value pairs that describe what the data are - i.e. the parameter, what vertical level they are on, etc. {harpIO} harnesses the {Rgrib2} package to read data from grib files, making use of the key-value pairs to extract the correct 2d fields from the files. Behind the scenes, {Rgirb2} uses the ecCodes system library, which includes lookup tables for many different operational centres.
For built in harp parameters (see
show_harp_parameters()
), the extraction of grib messages is
done based on the grib shortName,
indicatorOfTypeOfLevel, level, dataDate,
dataTime and stepRange keys. Additionally, for
ensemble forecasts the perturbationNumber key is used when
different ensemble members are in the same file. In some cases, the
values used for these keys may not be compatible with the key-value
pairs used in some grib files. For example, different forecast centres
may use values for the shortName key that are not listed in the lookup
tables that are included with ecCodes, or store fields on vertical
levels that are not consistent with what {harpIO} expects. In order to
address the grib_opts()
function can be used to set the
file_format_opts
argument in calls to
read_forecast()
or read_grid()
so that
specific key-value pairs can be used to override those built in to
{harpIO}.
In the following examples, we will use read_grid()
to
read gridded data from a grib file in the {harpData} package, so first
we create the path to the file. We export this to a system variable so
that
grib_file <- system.file(
"grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp",
package = "harpData"
)
We can then use the system command, grib_ls
from ecCodes
to get a summary of the content of the grib file. The -p tag is used to
just select the keys we might be interested in.
#> shortName indicatorOfParameter typeOfLevel level
#> t 11 heightAboveGround 0
#> z 6 heightAboveGround 0
#> t 11 heightAboveGround 2
#> q 51 heightAboveGround 2
#> u 33 heightAboveGround 10
#> v 34 heightAboveGround 10
#> tcc 71 heightAboveGround 0
#> u 33 isobaricInhPa 50
#> u 33 isobaricInhPa 100
#> u 33 isobaricInhPa 150
#> u 33 isobaricInhPa 200
#> u 33 isobaricInhPa 250
#> u 33 isobaricInhPa 300
#> u 33 isobaricInhPa 400
#> u 33 isobaricInhPa 500
#> u 33 isobaricInhPa 600
#> u 33 isobaricInhPa 700
#> u 33 isobaricInhPa 800
#> u 33 isobaricInhPa 850
#> u 33 isobaricInhPa 900
#> u 33 isobaricInhPa 925
#> u 33 isobaricInhPa 950
#> u 33 isobaricInhPa 1000
#> v 34 isobaricInhPa 50
#> v 34 isobaricInhPa 100
#> v 34 isobaricInhPa 150
#> v 34 isobaricInhPa 200
#> v 34 isobaricInhPa 250
#> v 34 isobaricInhPa 300
#> v 34 isobaricInhPa 400
#> v 34 isobaricInhPa 500
#> v 34 isobaricInhPa 600
#> v 34 isobaricInhPa 700
#> v 34 isobaricInhPa 800
#> v 34 isobaricInhPa 850
#> v 34 isobaricInhPa 900
#> v 34 isobaricInhPa 925
#> v 34 isobaricInhPa 950
#> v 34 isobaricInhPa 1000
#> t 11 isobaricInhPa 50
#> t 11 isobaricInhPa 100
#> t 11 isobaricInhPa 150
#> t 11 isobaricInhPa 200
#> t 11 isobaricInhPa 250
#> t 11 isobaricInhPa 300
#> t 11 isobaricInhPa 400
#> t 11 isobaricInhPa 500
#> t 11 isobaricInhPa 600
#> t 11 isobaricInhPa 700
#> t 11 isobaricInhPa 800
#> t 11 isobaricInhPa 850
#> t 11 isobaricInhPa 900
#> t 11 isobaricInhPa 925
#> t 11 isobaricInhPa 950
#> t 11 isobaricInhPa 1000
#> q 51 isobaricInhPa 50
#> q 51 isobaricInhPa 100
#> q 51 isobaricInhPa 150
#> q 51 isobaricInhPa 200
#> q 51 isobaricInhPa 250
#> q 51 isobaricInhPa 300
#> q 51 isobaricInhPa 400
#> q 51 isobaricInhPa 500
#> q 51 isobaricInhPa 600
#> q 51 isobaricInhPa 700
#> q 51 isobaricInhPa 800
#> q 51 isobaricInhPa 850
#> q 51 isobaricInhPa 900
#> q 51 isobaricInhPa 925
#> q 51 isobaricInhPa 950
#> q 51 isobaricInhPa 1000
#> z 6 isobaricInhPa 50
#> z 6 isobaricInhPa 100
#> z 6 isobaricInhPa 150
#> z 6 isobaricInhPa 200
#> z 6 isobaricInhPa 250
#> z 6 isobaricInhPa 300
#> z 6 isobaricInhPa 400
#> z 6 isobaricInhPa 500
#> z 6 isobaricInhPa 600
#> z 6 isobaricInhPa 700
#> z 6 isobaricInhPa 800
#> z 6 isobaricInhPa 850
#> z 6 isobaricInhPa 900
#> z 6 isobaricInhPa 925
#> z 6 isobaricInhPa 950
#> z 6 isobaricInhPa 1000
#> pres 1 heightAboveSea 0
#> u 33 heightAboveGround 20
#> u 33 heightAboveGround 50
#> u 33 heightAboveGround 100
#> u 33 heightAboveGround 250
#> u 33 heightAboveGround 500
#> u 33 heightAboveGround 750
#> u 33 heightAboveGround 1000
#> u 33 heightAboveGround 1250
#> u 33 heightAboveGround 1500
#> u 33 heightAboveGround 2000
#> u 33 heightAboveGround 2500
#> u 33 heightAboveGround 3000
#> v 34 heightAboveGround 20
#> v 34 heightAboveGround 50
#> v 34 heightAboveGround 100
#> v 34 heightAboveGround 250
#> v 34 heightAboveGround 500
#> v 34 heightAboveGround 750
#> v 34 heightAboveGround 1000
#> v 34 heightAboveGround 1250
#> v 34 heightAboveGround 1500
#> v 34 heightAboveGround 2000
#> v 34 heightAboveGround 2500
#> v 34 heightAboveGround 3000
#> t 11 heightAboveGround 20
#> t 11 heightAboveGround 50
#> t 11 heightAboveGround 100
#> t 11 heightAboveGround 250
#> t 11 heightAboveGround 500
#> t 11 heightAboveGround 750
#> t 11 heightAboveGround 1000
#> t 11 heightAboveGround 1250
#> t 11 heightAboveGround 1500
#> t 11 heightAboveGround 2000
#> t 11 heightAboveGround 2500
#> t 11 heightAboveGround 3000
#> pres 1 heightAboveGround 20
#> pres 1 heightAboveGround 50
#> pres 1 heightAboveGround 100
#> pres 1 heightAboveGround 250
#> pres 1 heightAboveGround 500
#> pres 1 heightAboveGround 750
#> pres 1 heightAboveGround 1000
#> pres 1 heightAboveGround 1250
#> pres 1 heightAboveGround 1500
#> pres 1 heightAboveGround 2000
#> pres 1 heightAboveGround 2500
#> pres 1 heightAboveGround 3000
#> pres 1 heightAboveGround 0
#> u 33 potentialVorticity 1500
#> u 33 potentialVorticity 2000
#> v 34 potentialVorticity 1500
#> v 34 potentialVorticity 2000
#> z 6 potentialVorticity 1500
#> z 6 potentialVorticity 2000
#> t 11 hybrid 65
#> ws 32 heightAboveGround 10
#> tp 61 heightAboveGround 0
Setting the shortName
If we take for example the grib message with shortName “tp”, we can read this without problem using the harp parameter “Pcp”:
read_grid(grib_file, "Pcp")
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp
#> eidb : TP Total precipitation kg m**-2
#> Time:
#> 2018/07/10 00:00
#> Domain summary:
#> 100 x 100 domain
#> Projection summary:
#> proj= lcc
#> NE = ( 23.31109 , 78.2607 )
#> SW = ( 8.042 , 77.968 )
#> Data summary:
#> 0 0.2287102 0.5398808 1.363777 1.41673 32.08449
However, what if for some reason we wanted to use “Precip” instead of
“Pcp” for the parameter name. Then read_grid()
will not
know which shortName to search for, so will attempt to find a
message with shortName = “precip”, and won’t be able to find
anything. Note it also uses levelType = 255 and level
= -999. This means that if it does find messages with with
shortName = “precip”, it will ignore the type of level and read
messages for all levels.
read_grid(grib_file, "Precip")
#> Warning: Parameter "Precip" (shortName: "Precip", typeOfLevel: "unknown" for
#> level(s) -999) not found in grib file.
#> Error: None of the requested data could be read from grib file: /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp
We can then use grib_opts()
with the argument
param_find
to force it to look for the shortName
“tp” instead. param_find
must be a named list with the name
being the same as the parameter that is requested, and the elements can
be set using a use_grib_*()
function, which are listed in
the help for grib_opts()
. To set the grib
shortName to look for, we use the
use_grib_shortName()
function.
read_grid(
grib_file,
"Precip",
file_format_opts = grib_opts(
param_find = list(Precip = use_grib_shortName("tp"))
)
)
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp
#> eidb : TP Total precipitation kg m**-2
#> Time:
#> 2018/07/10 00:00
#> Domain summary:
#> 100 x 100 domain
#> Projection summary:
#> proj= lcc
#> NE = ( 23.31109 , 78.2607 )
#> SW = ( 8.042 , 77.968 )
#> Data summary:
#> 0 0.2287102 0.5398808 1.363777 1.41673 32.08449
We can also read more than one variable in this way - for example, let’s say we want to read 2m temperature, precipitation and total cloud cover - we will stick with the harp name, “T2m”, for 2m temperature but use different names for precipitation (Precip, shortName = “tp”) and total cloud cover (Cloud, shortName = “tcc”).
read_grid(
grib_file,
c("Cloud", "T2m", "Precip"),
file_format_opts = grib_opts(
param_find = list(
Precip = use_grib_shortName("tp"),
Cloud = use_grib_shortName("tcc")
)
)
)
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp
#> <harp_geolist[3]>
#> [[1]] <numeric geofield [100 x 100] Min = 0.000 Max = 1.000 Mean = 0.968>
#> [[2]] <numeric geofield [100 x 100] Min = 0.000 Max = 32.084 Mean = 1.364>
#> [[3]] <numeric geofield [100 x 100] Min = 270.902 Max = 279.981 Mean = 276.333>
And if we set the output of read_grid()
to be a data
frame, we can see that the modified parameter names have been used.
read_grid(
grib_file,
c("Cloud", "T2m", "Precip"),
file_format_opts = grib_opts(
param_find = list(
Precip = use_grib_shortName("tp"),
Cloud = use_grib_shortName("tcc")
)
),
data_frame = TRUE
) %>%
select(-ends_with("dttm"), -members)
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp
#> # A tibble: 3 × 7
#> lead_time step_range parameter level_type level units gridded_data
#> <dbl> <chr> <chr> <chr> <int> <chr> <geolist>
#> 1 6 6 Cloud height 0 fraction [100 × 100]
#> 2 6 0-6 Precip height 0 kg/m^2 [100 × 100]
#> 3 6 6 T2m height 2 K [100 × 100]
Dealing with an unknown shortName
In some cases the grib lookup table that is used may be a local
version that is not included with ecCodes. In this case you would need
to use another grib key to locate the correct message. Typically this
would be indicatorOfParameter for grib edition 1, or
paramId for grib edition 2. Looking back to the output of
grib_ls
above, we can see that the message for total
precipitation (shortName = “tp”) has
indicatorOfParameter = 61, and that for 10m wind speed
(shortName = “ws”) has indicatorOfParameter = 32. We
can therefore use these to extract the data from the file by using
use_grib_indicatorOfParameter
.
read_grid(
grib_file,
c("total_precipitation", "wind_speed"),
file_format_opts = grib_opts(
param_find = list(
total_precipitation = use_grib_indicatorOfParameter(61),
wind_speed = use_grib_indicatorOfParameter(32)
)
)
)
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp
#> <harp_geolist[2]>
#> [[1]] <numeric geofield [100 x 100] Min = 0.000 Max = 32.084 Mean = 1.364>
#> [[2]] <numeric geofield [100 x 100] Min = 0.801 Max = 21.458 Mean = 9.962>
In practical situations, where the grib shortName is not known, it is the user’s responsibility to know which indicatorOfParameter or paramId to use.
Reading data from specific level types
In our example data file, some data exist on potential vorticity
levels. {harpIO} does not know how to read these data, but we can use
the level_find
argument of grib_opts()
to
select the correct type of level. level_find
works in the
same way as param_find
in that it requires a named list and
a use_grib_*()
function to set the correct grib keys. Here,
the typeOfLevel key has a value of “potentialVorticity”, so we
can read the geopotential (Z), wind speed in the x direction of the
model grid (U), and wind speed in the y direction of the model grid (V)
by setting the typeOfLevel key to “potentialVorticity”. Z, U, V
are parameters that {harpIO} knows, so we don’t need to worry about
param_find
.
read_grid(
grib_file,
c("U", "V", "Z"),
file_format_opts = grib_opts(
level_find = list(
U = use_grib_typeOfLevel("potentialVorticity"),
V = use_grib_typeOfLevel("potentialVorticity"),
Z = use_grib_typeOfLevel("potentialVorticity")
)
),
data_frame = TRUE
) %>%
select(-ends_with("dttm"), -members)
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp
#> # A tibble: 6 × 7
#> lead_time step_range parameter level_type level units gridded_data
#> <dbl> <chr> <chr> <chr> <int> <chr> <geolist>
#> 1 6 6 U unknown 1500 m/s [100 × 100]
#> 2 6 6 U unknown 2000 m/s [100 × 100]
#> 3 6 6 V unknown 1500 m/s [100 × 100]
#> 4 6 6 V unknown 2000 m/s [100 × 100]
#> 5 6 6 Z unknown 1500 m^2/s^2 [100 × 100]
#> 6 6 6 Z unknown 2000 m^2/s^2 [100 × 100]
By default, all levels are read, but we can use the second argument
of use_grib_typeOfLevel
to set the levels to read:
read_grid(
grib_file,
c("U", "V", "Z"),
file_format_opts = grib_opts(
level_find = list(
U = use_grib_typeOfLevel("potentialVorticity", 1500),
V = use_grib_typeOfLevel("potentialVorticity", 3000),
Z = use_grib_typeOfLevel("potentialVorticity", c(2000, 3000))
)
),
data_frame = TRUE
) %>%
select(-ends_with("dttm"), -members)
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp
#> Warning: Parameter "V" (shortName: "v", typeOfLevel: "potentialVorticity" for
#> level(s) 3000) not found in grib file.
#> # A tibble: 2 × 7
#> lead_time step_range parameter level_type level units gridded_data
#> <dbl> <chr> <chr> <chr> <int> <chr> <geolist>
#> 1 6 6 U unknown 1500 m/s [100 × 100]
#> 2 6 6 Z unknown 2000 m^2/s^2 [100 × 100]
Note that currently a warning is only given if no levels are found. If more than one level is asked for, but at least one is found the function is silent about the missing levels.
Reading specific levels
Currently read_grid()
and read_forecast()
can either read one or all vertical levels from files. To read a small
selection of levels from a grib file, the level_find
argument to grib_opts()
can be used conjunction with
use_grib_pressure()
, use_grib_model()
and
use_grib_heightAboveGround()
to read from specific
pressure, model (hybrid), or height levels. Our example data contain
data on a range of pressure levels, so if we want, for example,
temperature at 850-, 500- and 200 hPa, we would do the following:
read_grid(
grib_file,
"T",
file_format_opts = grib_opts(
level_find = list(
T = use_grib_pressure(c(850, 500, 200))
)
),
data_frame = TRUE
) %>%
select(-ends_with("date"), -members)
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp
#> # A tibble: 3 × 9
#> fcst_dttm valid_dttm lead_time step_range parameter
#> <dttm> <dttm> <dbl> <chr> <chr>
#> 1 2018-07-10 00:00:00 2018-07-10 06:00:00 6 6 T
#> 2 2018-07-10 00:00:00 2018-07-10 06:00:00 6 6 T
#> 3 2018-07-10 00:00:00 2018-07-10 06:00:00 6 6 T
#> # ℹ 4 more variables: level_type <chr>, level <int>, units <chr>,
#> # gridded_data <geolist>
Similarly, to read temperature at 500-, 1000- and 1500m above the ground we would do the following:
read_grid(
grib_file,
"T",
file_format_opts = grib_opts(
level_find = list(
T = use_grib_heightAboveGround(c(500, 1000, 1500))
)
),
data_frame = TRUE
) %>%
select(-ends_with("date"), -members)
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/grib/AROME_Arctic/2018/07/10/00/fc2018071000+006grib_fp
#> # A tibble: 3 × 9
#> fcst_dttm valid_dttm lead_time step_range parameter
#> <dttm> <dttm> <dbl> <chr> <chr>
#> 1 2018-07-10 00:00:00 2018-07-10 06:00:00 6 6 T
#> 2 2018-07-10 00:00:00 2018-07-10 06:00:00 6 6 T
#> 3 2018-07-10 00:00:00 2018-07-10 06:00:00 6 6 T
#> # ℹ 4 more variables: level_type <chr>, level <int>, units <chr>,
#> # gridded_data <geolist>
It should be emphasised that no interpolation is
done. i.e. you can only use level_find
to select
vertical levels that exist in the file.
Netcdf files
Netcdf files are becoming increasingly used in meteorology and
climatology. Each variable in a netcdf file is stored as an array that
can be subsetted. Each array in the file has attributes and dimensions
that describe the variable. In order to read data from netcdf files
using {harpIO} it is important that you know something about the
contents of the files so that the correct variables can be read and the
correct subsets taken. This is done by passing a list generated by the
netcdf_opts()
function to read_forecast()
or
read_grid()
via the file_format_opts
argument.
netcdf_opts()
generates a list that has names for the
dimensions of an array and either where to find the geographic
projection information in the file, or the projection information as a
proj4
string, and whether any of the dimensions are reversed.
We can use a netcdf file from the {harpData} package to show how
netcdf_opts()
works.
nc_file <- system.file(
"netcdf/AAEPS/fc2018071000.nc",
package = "harpData"
)
Then we can open a connection to the file using the {ncdf4} package to get a summary of the contents of the file.
#> File /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc (NC_FORMAT_NETCDF4_CLASSIC):
#>
#> 13 variables (excluding dimension variables):
#> double forecast_reference_time[] (Contiguous storage)
#> units: seconds since 1970-01-01 00:00:00 +00:00
#> standard_name: forecast_reference_time
#> double p0[] (Contiguous storage)
#> units: Pa
#> double ap[hybrid] (Chunking: [65]) (Compression: shuffle,level 3)
#> units: Pa
#> double b[hybrid] (Chunking: [65]) (Compression: shuffle,level 3)
#> units: 1
#> int projection_lambert[] (Contiguous storage)
#> grid_mapping_name: lambert_conformal_conic
#> standard_parallel: 77.5
#> standard_parallel: 77.5
#> longitude_of_central_meridian: -25
#> latitude_of_projection_origin: 77.5
#> earth_radius: 6371000
#> proj4: +proj=lcc +lat_0=77.5 +lon_0=-25 +lat_1=77.5 +lat_2=77.5 +no_defs +R=6.371e+06
#> double longitude[x,y] (Chunking: [100,100]) (Compression: shuffle,level 3)
#> units: degree_east
#> long_name: longitude
#> standard_name: longitude
#> double latitude[x,y] (Chunking: [100,100]) (Compression: shuffle,level 3)
#> units: degree_north
#> long_name: latitude
#> standard_name: latitude
#> int specific_humidity_ml[x,y,ensemble_member,hybrid,time] (Chunking: [100,100,5,24,1]) (Compression: shuffle,level 3)
#> _FillValue: -2147483647
#> long_name: Specific humidity model levels
#> standard_name: specific_humidity
#> units: kg/kg
#> grid_mapping: projection_lambert
#> coordinates: longitude latitude
#> scale_factor: 1e-07
#> float air_temperature_2m[x,y,ensemble_member,height1,time] (Chunking: [100,100,5,1,1]) (Compression: shuffle,level 3)
#> _FillValue: 9.96920996838687e+36
#> long_name: Screen level temperature (T2M)
#> standard_name: air_temperature
#> units: K
#> grid_mapping: projection_lambert
#> coordinates: longitude latitude
#> float specific_humidity_2m[x,y,ensemble_member,height1,time] (Chunking: [100,100,5,1,1]) (Compression: shuffle,level 3)
#> _FillValue: 9.96920996838687e+36
#> long_name: Screen level specific humidity
#> standard_name: specific_humidity
#> units: kg/kg
#> grid_mapping: projection_lambert
#> coordinates: longitude latitude
#> int air_temperature_ml[x,y,ensemble_member,hybrid,time] (Chunking: [100,100,5,24,1]) (Compression: shuffle,level 3)
#> _FillValue: -2147483647
#> long_name: Air temperature model levels
#> standard_name: air_temperature
#> units: K
#> grid_mapping: projection_lambert
#> coordinates: longitude latitude
#> scale_factor: 0.001
#> float surface_air_pressure[x,y,ensemble_member,height0,time] (Chunking: [100,100,5,1,1]) (Compression: shuffle,level 3)
#> _FillValue: 9.96920996838687e+36
#> long_name: Surface air pressure
#> standard_name: surface_air_pressure
#> units: Pa
#> grid_mapping: projection_lambert
#> coordinates: longitude latitude
#> float surface_geopotential[x,y,ensemble_member,height0,time] (Chunking: [100,100,5,1,1]) (Compression: shuffle,level 3)
#> _FillValue: 9.96920996838687e+36
#> long_name: Surface geopotential (fis)
#> standard_name: surface_geopotential
#> units: m^2/s^2
#> grid_mapping: projection_lambert
#> coordinates: longitude latitude
#>
#> 7 dimensions:
#> time Size:5 *** is unlimited ***
#> long_name: time
#> standard_name: time
#> units: seconds since 1970-01-01 00:00:00 +00:00
#> ensemble_member Size:5
#> long_name: ensemble run number
#> standard_name: realization
#> flag_values: 0
#> flag_values: 1
#> flag_values: 2
#> flag_values: 3
#> flag_values: 4
#> flag_values: 5
#> flag_values: 6
#> flag_values: 7
#> flag_values: 8
#> flag_values: 9
#> flag_values: 10
#> flag_meanings: mbr000 mbr001 mbr002 mbr003 mbr004 mbr005 mbr006 mbr007 mbr008 mbr009 mbr010
#> height0 Size:1
#> description: height above ground
#> long_name: height
#> positive: up
#> units: m
#> height1 Size:1
#> description: height above ground
#> long_name: height
#> positive: up
#> units: m
#> hybrid Size:65
#> standard_name: atmosphere_hybrid_sigma_pressure_coordinate
#> formula: p(n,k,j,i) = ap(k) + b(k)*ps(n,j,i)
#> formula_terms: ap: ap b: b ps: surface_air_pressure p0: p0
#> long_name: atmosphere_hybrid_sigma_pressure_coordinate
#> positive: down
#> x Size:100
#> long_name: x-coordinate in Cartesian system
#> standard_name: projection_x_coordinate
#> units: m
#> y Size:100
#> long_name: y-coordinate in Cartesian system
#> standard_name: projection_y_coordinate
#> units: m
#>
#> 16 global attributes:
#> Conventions: CF-1.6
#> institution: Norwegian Meteorological Institute, MET Norway
#> creator_url: met.no
#> summary: MEPS (MetCoOp-Ensemble Prediction System) is a convection-permitting atmosphere ensemble model covering Scandinavia and the Nordic Seas. It has horizontal resolution of 2.5 km, has 65 vertical levels and 10 members. MEPS is ran four times daily (00,06,12,18) with three-hourly cycling for data assimilation. Member 0 and 1 are ran up to 66 hours, the rest up to 48 hours. Boundary data is from ECMWF, and initial perturbations are based on the SLAF method. Model code based on HARMONIE cy40h1.1
#> source: MEPS 2.5km
#> title: MEPS 2.5km
#> min_time: 2018-07-10 00:00:00Z
#> max_time: 2018-07-12
#> geospatial_lat_min: 51.0
#> geospatial_lat_max: 88.0
#> geospatial_lon_min: -20.0
#> geospatial_lon_max: 80.0
#> references: unknown
#> license: https://www.met.no/en/free-meteorological-data/Licensing-and-crediting
#> comment: none
#> history: 2019-07-17 creation by fimex
From the above you see the variables, their attributes, the
dimensions of the variables and some global attributes. We can use this
information to set the correct information using
netcdf_opts()
when we want to read data using
read_forecast()
, or read_grid()
. For these
examples we will use read_grid()
, but the same principles
apply for read_forecast()
.
It is important to close the connection to a netcdf file when we are
finished with it (this is taken care of for us when we use
read_forecast()
and read_grid()
).
nc_close(nc)
Let’s take a look at some of the default values for
netcdf_opts()
#> $options_set
#> [1] "none"
#>
#> $param_find
#> list()
#>
#> $proj4_var
#> [1] "projection_lambert"
#>
#> $proj4_att
#> [1] "proj4"
#>
#> $proj4
#> NULL
#>
#> $x_dim
#> [1] "x"
#>
#> $y_dim
#> [1] "y"
#>
#> $lon_var
#> [1] "longitude"
#>
#> $lat_var
#> [1] "latitude"
#>
#> $x_rev
#> [1] FALSE
#>
#> $y_rev
#> [1] FALSE
#>
#> $dx
#> NULL
#>
#> $dy
#> NULL
#>
#> $z_var
#> [1] NA
#>
#> $member_var
#> [1] NA
#>
#> $time_var
#> [1] "time"
#>
#> $ref_time_var
#> [1] NA
#>
#> $force_param_name
#> [1] FALSE
Projection information
There are 3 elements in the list that begin with proj4_
.
These are to control where the projection information is obtained.
- proj4_var is the name of the variable in the netcdf file that holds the proj4 string.
-
proj4_att is the name of the attribute of
proj4_var that has the proj4 string. If set to
NULL
, an attempt is made to read the proj4 string directly from proj4_var -
proj4 is where you can set the proj4 string directly rather
than attempt to read it from the file. If proj4 is not
NULL
, proj4_var and proj4_att are ignored.
In the summary of our netcdf file above, there is a variable
projection_lambert
that has an attribute
proj4
, which contains the proj4 string “+proj=lcc
+lat_0=77.5 +lon_0=-25 +lat_1=77.5 +lat_2=77.5 +no_defs +R=6.371e+06”,
so the default values we get from netcdf_opts()
are correct
for this file.
Dimension names
The dimension names of variables are set using x_dim,
y_dim, z_var, member_var and
time_var. The reason we need to know the names are that arrays
in netcdf file can be stored in any dimension order, and we may wish to
read only a subset of the array so we need to know which parts of the
dimensions to take. By default, netcdf_opts()
gives us:
x_dim = "x"
y_dim = "y"
time_var = "time"
z_var = NA
member_var = NA
Since z_var and member_var are NA, the default
means that it is expecting to find variables in the netcdf file with 3
dimensions: “x”, “y” and “time”. If we go back to our example above, and
let’s say we want to read 2m temperature (“air_temperature_2m”, or the
harp parameter “T2m”), we see that air_temperature_2m has 5 dimensions:
x, y, ensemble_member, height1 and
time. We therefore need to modify the values of z_var
and member_var using netcdf_opts()
:
my_nc_opts <- netcdf_opts(z_var = "height1", member_var = "ensemble_member")
Latitude, longitude and time
This leaves lon_var, lat_var and
ref_time_var. If the file contains arrays of longitude and
latitude, these are used to geographically orientate the data - if they
are set to NULL
an attempt is made to do this using
x_dim and y_dim and the projection information.
ref_time_var is used in forecasts to calculate the lead time.
If set to NA, the lead time is calculated by subtracting the first
element of the time dimension from each subsequent element.
We can now attempt to read some data - firstly without using
netcdf_opts()
:
read_grid(nc_file, "T2m", data_frame = TRUE)
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc
#> # A tibble: 25 × 9
#> fcst_dttm valid_dttm lead_time parameter level_type level
#> <dttm> <dttm> <dbl> <chr> <chr> <dbl>
#> 1 2018-07-10 00:00:00 2018-07-10 00:00:00 0 T2m height 2
#> 2 2018-07-10 00:00:00 2018-07-10 00:00:00 0 T2m height 2
#> 3 2018-07-10 00:00:00 2018-07-10 00:00:00 0 T2m height 2
#> 4 2018-07-10 00:00:00 2018-07-10 00:00:00 0 T2m height 2
#> 5 2018-07-10 00:00:00 2018-07-10 00:00:00 0 T2m height 2
#> 6 2018-07-10 00:00:00 2018-07-10 03:00:00 3 T2m height 2
#> 7 2018-07-10 00:00:00 2018-07-10 03:00:00 3 T2m height 2
#> 8 2018-07-10 00:00:00 2018-07-10 03:00:00 3 T2m height 2
#> 9 2018-07-10 00:00:00 2018-07-10 03:00:00 3 T2m height 2
#> 10 2018-07-10 00:00:00 2018-07-10 03:00:00 3 T2m height 2
#> # ℹ 15 more rows
#> # ℹ 3 more variables: units <chr>, gridded_data <geolist>, members <int[1d]>
The error tells you what dimension names you have given - in this
case (time, x, y) - the defaults from netcdf_opts()
, and
what dimensions it found in the file: (ensemble_member, height1, time,
x, y). Note that in the warning the dimensions are listed in
alphabetical order, it us up to you to decide which dimension they
should go with.
So, now if we use our modified options to read the file, we should get a result:
read_grid(nc_file, "T2m", file_format_opts = my_nc_opts, data_frame = TRUE) %>%
select(-ends_with("dttm"))
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc
#> # A tibble: 25 × 7
#> lead_time parameter level_type level units gridded_data members
#> <dbl> <chr> <chr> <dbl> <chr> <geolist> <int[1d]>
#> 1 0 T2m height 2 K [100 × 100] 0
#> 2 0 T2m height 2 K [100 × 100] 1
#> 3 0 T2m height 2 K [100 × 100] 2
#> 4 0 T2m height 2 K [100 × 100] 3
#> 5 0 T2m height 2 K [100 × 100] 4
#> 6 3 T2m height 2 K [100 × 100] 0
#> 7 3 T2m height 2 K [100 × 100] 1
#> 8 3 T2m height 2 K [100 × 100] 2
#> 9 3 T2m height 2 K [100 × 100] 3
#> 10 3 T2m height 2 K [100 × 100] 4
#> # ℹ 15 more rows
We can also select specific members and lead times:
read_grid(
nc_file,
"T2m",
lead_time = c(0,3),
members = c(0, 2, 4),
file_format_opts = my_nc_opts,
data_frame = TRUE
) %>%
select(-ends_with("dttm"))
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc
#> # A tibble: 6 × 7
#> lead_time parameter level_type level units gridded_data members
#> <dbl> <chr> <chr> <dbl> <chr> <geolist> <int[1d]>
#> 1 0 T2m height 2 K [100 × 100] 0
#> 2 0 T2m height 2 K [100 × 100] 2
#> 3 0 T2m height 2 K [100 × 100] 4
#> 4 3 T2m height 2 K [100 × 100] 0
#> 5 3 T2m height 2 K [100 × 100] 2
#> 6 3 T2m height 2 K [100 × 100] 4
Options sets
Finally, netcdf_opts()
has a number of default sets of
options that can be set via the options_set
argument. These
set the options to known values for some data sources - currently
operational NWP forecasts from MET Norway and WRF (though this isn’t
fully tested). These options are:
- “met_norway_eps” - EPS forecasts produced by MET Norway,
- “met_norway_det” - deterministic forecasts produced by MET Norway
- “met_norway_ifsens” - IFSENS forecasts from ECMWF archived at MET Norway
- “met_norway_ifshires” - IFSHIRES forecasts from ECMWF archived at MET Norway
- “wrf” - WRF forecasts on the WRF “mass” grid
- “wrf_u_stagger” - WRF forecasts on the U-staggered grid
- “wrf_v_stagger” - WRF forecasts on the V-staggered grid
You can, for example, use the “met_norway_det” options set when reading data from the MET Norway thredds server:
read_grid(
"https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T21Z.nc",
"T2m",
lead_time = 6,
file_format = "netcdf",
file_format_opts = netcdf_opts("met_norway_det")
)
#> Reading https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T21Z.nc
#> : T2m K
#> Time:
#> 2021/03/01 21:00
#> Domain summary:
#> 739 x 949 domain
#> Projection summary:
#> proj= lcc
#> NE = ( 68.83323 , 71.10861 )
#> SW = ( -17.957 , 69.299 )
#> Data summary:
#> 241.2239 257.6566 267.764 264.93 272.1761 279.136
Note that to download data from an external source we need to
explicitly tell read_grid()
(or
read_forecast()
) that the file format is “netcdf”. This is
because when the file is on disk an initial scan of the file is done to
try and guess the format, but this is not possible with external files.
For demonstration purposes we could do the same using
read_forecast()
url <- "https://thredds.met.no/thredds/dodsC/aromearcticarchive"
template <- paste0(
"{YYYY}/{MM}/{DD}/",
"{fcst_model}_extracted_2_5km_{YYYY}{MM}{DD}T{HH}Z.nc"
)
read_forecast(
seq_dttm(2021030100, 2021030118, "6h"),
fcst_model = "arome_arctic",
parameter = "T2m",
by = "6h",
lead_time = seq(0, 3),
file_path = url,
file_template = template,
file_format = "netcdf",
file_format_opts = netcdf_opts("met_norway_det"),
return_data = TRUE
) %>%
select(-starts_with("level"), -fcst_cycle ,-valid_dttm)
#> Reading https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T00Z.nc
#> Reading https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T06Z.nc
#> Reading https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T12Z.nc
#> Reading https://thredds.met.no/thredds/dodsC/aromearcticarchive/2021/03/01/arome_arctic_extracted_2_5km_20210301T18Z.nc
#> ::deterministic gridded forecast:: # A tibble: 16 × 6
#> fcst_model fcst_dttm lead_time parameter units fcst
#> <chr> <dttm> <dbl> <chr> <chr> <geolist>
#> 1 arome_arctic 2021-03-01 00:00:00 0 T2m K [739 × 949]
#> 2 arome_arctic 2021-03-01 00:00:00 1 T2m K [739 × 949]
#> 3 arome_arctic 2021-03-01 00:00:00 2 T2m K [739 × 949]
#> 4 arome_arctic 2021-03-01 00:00:00 3 T2m K [739 × 949]
#> 5 arome_arctic 2021-03-01 06:00:00 0 T2m K [739 × 949]
#> 6 arome_arctic 2021-03-01 06:00:00 1 T2m K [739 × 949]
#> 7 arome_arctic 2021-03-01 06:00:00 2 T2m K [739 × 949]
#> 8 arome_arctic 2021-03-01 06:00:00 3 T2m K [739 × 949]
#> 9 arome_arctic 2021-03-01 12:00:00 0 T2m K [739 × 949]
#> 10 arome_arctic 2021-03-01 12:00:00 1 T2m K [739 × 949]
#> 11 arome_arctic 2021-03-01 12:00:00 2 T2m K [739 × 949]
#> 12 arome_arctic 2021-03-01 12:00:00 3 T2m K [739 × 949]
#> 13 arome_arctic 2021-03-01 18:00:00 0 T2m K [739 × 949]
#> 14 arome_arctic 2021-03-01 18:00:00 1 T2m K [739 × 949]
#> 15 arome_arctic 2021-03-01 18:00:00 2 T2m K [739 × 949]
#> 16 arome_arctic 2021-03-01 18:00:00 3 T2m K [739 × 949]
The z_var problem
Netcdf files typically contain many variables, and for meteorological
data they can have a range of vertical coordinates - e.g. height,
pressure, model levels etc. So although we have set our options with
netcdf_opts()
we don’t want to set everything for every
variable we wish to read from the file. To address this, the function
modify_opts()
can be used to change one or more options. If
we try to read “specific_humidity_ml” from our example file with the
netcdf options we currently have, we will get an error as z_var
= “height1”, but for this variable it needs to be “hybrid”:
read_grid(
nc_file,
"specific_humidity_ml",
lead_time = 3,
file_format_opts = my_nc_opts,
data_frame = TRUE
)
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc
#> # A tibble: 325 × 9
#> fcst_dttm valid_dttm lead_time parameter level_type
#> <dttm> <dttm> <dbl> <chr> <chr>
#> 1 2018-07-10 00:00:00 2018-07-10 03:00:00 3 specific_humidi… pressure
#> 2 2018-07-10 00:00:00 2018-07-10 03:00:00 3 specific_humidi… pressure
#> 3 2018-07-10 00:00:00 2018-07-10 03:00:00 3 specific_humidi… pressure
#> 4 2018-07-10 00:00:00 2018-07-10 03:00:00 3 specific_humidi… pressure
#> 5 2018-07-10 00:00:00 2018-07-10 03:00:00 3 specific_humidi… pressure
#> 6 2018-07-10 00:00:00 2018-07-10 03:00:00 3 specific_humidi… pressure
#> 7 2018-07-10 00:00:00 2018-07-10 03:00:00 3 specific_humidi… pressure
#> 8 2018-07-10 00:00:00 2018-07-10 03:00:00 3 specific_humidi… pressure
#> 9 2018-07-10 00:00:00 2018-07-10 03:00:00 3 specific_humidi… pressure
#> 10 2018-07-10 00:00:00 2018-07-10 03:00:00 3 specific_humidi… pressure
#> # ℹ 315 more rows
#> # ℹ 4 more variables: level <dbl>, units <chr>, gridded_data <geolist>,
#> # members <int[1d]>
So, to change z_var to “hybrid” we use
modify_opts()
:
read_grid(
nc_file,
"specific_humidity_ml",
lead_time = 3,
file_format_opts = modify_opts(my_nc_opts, z_var = "hybrid"),
data_frame = TRUE,
vertical_coordinate = "model"
) %>%
select(-ends_with("dttm"))
#> Reading /home/andrewts/R/x86_64-pc-linux-gnu-library/4.3/harpData/netcdf/AAEPS/fc2018071000.nc
#> # A tibble: 325 × 7
#> lead_time parameter level_type level units gridded_data members
#> <dbl> <chr> <chr> <dbl> <chr> <geolist> <int[1d>
#> 1 3 specific_humidity_ml model 0.00987 kg/kg [100 × 100] 0
#> 2 3 specific_humidity_ml model 0.00987 kg/kg [100 × 100] 1
#> 3 3 specific_humidity_ml model 0.00987 kg/kg [100 × 100] 2
#> 4 3 specific_humidity_ml model 0.00987 kg/kg [100 × 100] 3
#> 5 3 specific_humidity_ml model 0.00987 kg/kg [100 × 100] 4
#> 6 3 specific_humidity_ml model 0.0296 kg/kg [100 × 100] 0
#> 7 3 specific_humidity_ml model 0.0296 kg/kg [100 × 100] 1
#> 8 3 specific_humidity_ml model 0.0296 kg/kg [100 × 100] 2
#> 9 3 specific_humidity_ml model 0.0296 kg/kg [100 × 100] 3
#> 10 3 specific_humidity_ml model 0.0296 kg/kg [100 × 100] 4
#> # ℹ 315 more rows
We also have to pass the vertical_coordinate = "model"
to get the correct name in the level_type
column in the
data frame of the output.
Note that for variables with a singleton z dimension, any name can be given for z_var as long as it exists as a dimension in the file. Furthermore, if there are many different dimensions with names such as “height0”, “height1”, “height2” etc. the number doesn’t have to match. This means that you do not have to modify z_var so much.