Read point observations from multiple files

read_obs generates file names, based on the arguments given and reads point observations data from them. The data can optionally be re-written to files of a different format. Due to the large volumes of data that may be read, the function will only return data to the calling environment if return_data = TRUE

Usage

read_obs(
  dttm,
  parameter,
  param_defs = get("harp_params"),
  stations = NULL,
  file_path = getwd(),
  file_format = NULL,
  file_template = "vobs",
  file_format_opts = vfile_opts("vobs"),
  output_format = "obstable",
  output_format_opts = obstable_opts(),
  return_data = FALSE,
  start_date = NULL,
  end_date = NULL,
  by = "1h",
  reads_per_write = 24,
  ...
)

Arguments

dttm: A vector of date time strings to read. Can be in YYYYMMDD, YYYYMMDDhh, YYYYMMDDhhmm, or YYYYMMDDhhmmss format. Can be numeric or character. A vector of date-times can be generated using seq_dttm.
parameter: The names of the parameters to read. By default this is NULL, meaning that all parameters are read from the observations files.
param_defs: A list of parameter definitions that includes the file format to be read. By default the built in list harp_params is used. Modifications and additions to this list can be made using modify_param_def and add_param_def respectively.
stations: The IDs of the stations to read from the files. By default this is NULL, meaning that observations for all stations are read from the observations files.
file_path: The parent path to all forecast data. All file names are generated to be under the file_path directory. The default is the current working directory.
file_format: The format of the files to read. By default this is "vobs", which is the standard format used by the HIRLAM consortium. If set to something else, read_obs will search the global environment for a function called read_<file_format> that it will use to read from the files.
file_template: A template for the file names. For available built in templates see show_file_templates. If anything else is passed, it is returned unmodified, or with substitutions made for dynamic values. Available substitutions are YYYY for year, {MM} for 2 digit month with leading zero, {M} for month with no leading zero, and similarly {DD} or {D} for day, {HH} or {H} for hour, {mm} or {m} for minute. Note that the full path to the file will always be file_path/template. Other substitutions can be passed via ...
file_format_opts: Specific options for reading the file format specified in file_format. Should be a named list, with names corresponding to argument for read_<file_format>.
output_format: The file format to re-write the data to. By default this is "obstable", which is an sqlite file desgined specifically for the harp ecosystem. If set to something else, read_obs will search the global environment for a function called write_<file_format> that it will use to write to the output file(s).
output_format_opts: Specific options for writing to file_format files. Must be a named list and at least include the names "path" and "template". By setting output_format_opts$path to something other than NULL, read_obs will attempt to write out the data.
return_data: Logical - whether to return the data read in to the calling environment. Due to the potential for large volumes of data, this is set to FALSE by default.
start_date, end_date, by: The use of start_date, end_date and by is no longer supported. dttm together with seq_dttm should be used to generate equally spaced date-times.
reads_per_write: The number of files to read before writing out the data to new files. Set this to a low number to reduce memory usage. The default is 24 based on the assumption that there are observations files every hour and writing should be done once per observation day. For the default setting of writing to "obstable" files, this number has no impact on the output since these files can be appended to. For other formats, this setting might be important to prevent data from being overwritten.
...: Other arguments to generate_filenames for getting the names of files to read.

Value

If return_data = TRUE, a list with data frames of observations.

Details

read_obs is not intended to be used for reading gridded observations. For this use read_analysis instead.