The default behaviour in harp is to store ensemble data in wide data frames.
That means that there is one column for each member of the ensemble. This
isn't always ideal and goes against the principles of tidy data, whereby
each ensemble member would be stored on a separate row with a single
column denoting the ensemble member. pivot_members
can be used to
pivot between the wide and long formats in both directions.
Details
When pivoting from a wide to a long data frame, the class is updated to indicate that the ensemble members are stored in rows rather than columns. When pivoting back to a wide data frame format, the class is returned to its original names.
Examples
pivot_members(ens_point_df)
#> ::ensemble point forecast [[long]]:: # A tibble: 96 × 7
#> sub_model fcst_dttm lead_time valid_dttm SID member
#> * <chr> <dttm> <dbl> <dttm> <dbl> <chr>
#> 1 point 2021-01-01 00:00:00 0 2021-01-01 00:00:00 1001 mbr000
#> 2 point 2021-01-01 00:00:00 0 2021-01-01 00:00:00 1001 mbr001
#> 3 point 2021-01-01 00:00:00 1 2021-01-01 01:00:00 1001 mbr000
#> 4 point 2021-01-01 00:00:00 1 2021-01-01 01:00:00 1001 mbr001
#> 5 point 2021-01-01 00:00:00 2 2021-01-01 02:00:00 1001 mbr000
#> 6 point 2021-01-01 00:00:00 2 2021-01-01 02:00:00 1001 mbr001
#> 7 point 2021-01-01 00:00:00 3 2021-01-01 03:00:00 1001 mbr000
#> 8 point 2021-01-01 00:00:00 3 2021-01-01 03:00:00 1001 mbr001
#> 9 point 2021-01-01 00:00:00 4 2021-01-01 04:00:00 1001 mbr000
#> 10 point 2021-01-01 00:00:00 4 2021-01-01 04:00:00 1001 mbr001
#> # ℹ 86 more rows
#> # ℹ 1 more variable: fcst <dbl>
pivot_members(ens_grid_df)
#> ::ensemble gridded forecast [[long]]:: # A tibble: 48 × 6
#> sub_model fcst_dttm lead_time valid_dttm member fcst
#> * <chr> <dttm> <dbl> <dttm> <chr> <geolist>
#> 1 grid 2021-01-01 00:00:00 0 2021-01-01 00:00:00 mbr000 [5 × 5]
#> 2 grid 2021-01-01 00:00:00 0 2021-01-01 00:00:00 mbr001 [5 × 5]
#> 3 grid 2021-01-01 00:00:00 1 2021-01-01 01:00:00 mbr000 [5 × 5]
#> 4 grid 2021-01-01 00:00:00 1 2021-01-01 01:00:00 mbr001 [5 × 5]
#> 5 grid 2021-01-01 00:00:00 2 2021-01-01 02:00:00 mbr000 [5 × 5]
#> 6 grid 2021-01-01 00:00:00 2 2021-01-01 02:00:00 mbr001 [5 × 5]
#> 7 grid 2021-01-01 00:00:00 3 2021-01-01 03:00:00 mbr000 [5 × 5]
#> 8 grid 2021-01-01 00:00:00 3 2021-01-01 03:00:00 mbr001 [5 × 5]
#> 9 grid 2021-01-01 00:00:00 4 2021-01-01 04:00:00 mbr000 [5 × 5]
#> 10 grid 2021-01-01 00:00:00 4 2021-01-01 04:00:00 mbr001 [5 × 5]
#> # ℹ 38 more rows
pivot_members(ens_point_list)
#> • a
#> ::ensemble point forecast [[long]]:: # A tibble: 96 × 8
#> fcst_model sub_model fcst_dttm lead_time valid_dttm SID
#> * <chr> <chr> <dttm> <dbl> <dttm> <dbl>
#> 1 a a 2021-01-01 00:00:00 0 2021-01-01 00:00:00 1001
#> 2 a a 2021-01-01 00:00:00 0 2021-01-01 00:00:00 1001
#> 3 a a 2021-01-01 00:00:00 1 2021-01-01 01:00:00 1001
#> 4 a a 2021-01-01 00:00:00 1 2021-01-01 01:00:00 1001
#> 5 a a 2021-01-01 00:00:00 2 2021-01-01 02:00:00 1001
#> 6 a a 2021-01-01 00:00:00 2 2021-01-01 02:00:00 1001
#> 7 a a 2021-01-01 00:00:00 3 2021-01-01 03:00:00 1001
#> 8 a a 2021-01-01 00:00:00 3 2021-01-01 03:00:00 1001
#> 9 a a 2021-01-01 00:00:00 4 2021-01-01 04:00:00 1001
#> 10 a a 2021-01-01 00:00:00 4 2021-01-01 04:00:00 1001
#> # ℹ 86 more rows
#> # ℹ 2 more variables: member <chr>, fcst <dbl>
#>
#> • b
#> ::ensemble point forecast [[long]]:: # A tibble: 96 × 8
#> fcst_model sub_model fcst_dttm lead_time valid_dttm SID
#> * <chr> <chr> <dttm> <dbl> <dttm> <dbl>
#> 1 b b 2021-01-01 00:00:00 0 2021-01-01 00:00:00 1001
#> 2 b b 2021-01-01 00:00:00 0 2021-01-01 00:00:00 1001
#> 3 b b 2021-01-01 00:00:00 1 2021-01-01 01:00:00 1001
#> 4 b b 2021-01-01 00:00:00 1 2021-01-01 01:00:00 1001
#> 5 b b 2021-01-01 00:00:00 2 2021-01-01 02:00:00 1001
#> 6 b b 2021-01-01 00:00:00 2 2021-01-01 02:00:00 1001
#> 7 b b 2021-01-01 00:00:00 3 2021-01-01 03:00:00 1001
#> 8 b b 2021-01-01 00:00:00 3 2021-01-01 03:00:00 1001
#> 9 b b 2021-01-01 00:00:00 4 2021-01-01 04:00:00 1001
#> 10 b b 2021-01-01 00:00:00 4 2021-01-01 04:00:00 1001
#> # ℹ 86 more rows
#> # ℹ 2 more variables: member <chr>, fcst <dbl>
#>
pivot_members(ens_grid_list)
#> • a
#> ::ensemble gridded forecast [[long]]:: # A tibble: 48 × 7
#> fcst_model sub_model fcst_dttm lead_time valid_dttm member
#> * <chr> <chr> <dttm> <dbl> <dttm> <chr>
#> 1 a a 2021-01-01 00:00:00 0 2021-01-01 00:00:00 mbr000
#> 2 a a 2021-01-01 00:00:00 0 2021-01-01 00:00:00 mbr001
#> 3 a a 2021-01-01 00:00:00 1 2021-01-01 01:00:00 mbr000
#> 4 a a 2021-01-01 00:00:00 1 2021-01-01 01:00:00 mbr001
#> 5 a a 2021-01-01 00:00:00 2 2021-01-01 02:00:00 mbr000
#> 6 a a 2021-01-01 00:00:00 2 2021-01-01 02:00:00 mbr001
#> 7 a a 2021-01-01 00:00:00 3 2021-01-01 03:00:00 mbr000
#> 8 a a 2021-01-01 00:00:00 3 2021-01-01 03:00:00 mbr001
#> 9 a a 2021-01-01 00:00:00 4 2021-01-01 04:00:00 mbr000
#> 10 a a 2021-01-01 00:00:00 4 2021-01-01 04:00:00 mbr001
#> # ℹ 38 more rows
#> # ℹ 1 more variable: fcst <geolist>
#>
#> • b
#> ::ensemble gridded forecast [[long]]:: # A tibble: 48 × 7
#> fcst_model sub_model fcst_dttm lead_time valid_dttm member
#> * <chr> <chr> <dttm> <dbl> <dttm> <chr>
#> 1 b b 2021-01-01 00:00:00 0 2021-01-01 00:00:00 mbr000
#> 2 b b 2021-01-01 00:00:00 0 2021-01-01 00:00:00 mbr001
#> 3 b b 2021-01-01 00:00:00 1 2021-01-01 01:00:00 mbr000
#> 4 b b 2021-01-01 00:00:00 1 2021-01-01 01:00:00 mbr001
#> 5 b b 2021-01-01 00:00:00 2 2021-01-01 02:00:00 mbr000
#> 6 b b 2021-01-01 00:00:00 2 2021-01-01 02:00:00 mbr001
#> 7 b b 2021-01-01 00:00:00 3 2021-01-01 03:00:00 mbr000
#> 8 b b 2021-01-01 00:00:00 3 2021-01-01 03:00:00 mbr001
#> 9 b b 2021-01-01 00:00:00 4 2021-01-01 04:00:00 mbr000
#> 10 b b 2021-01-01 00:00:00 4 2021-01-01 04:00:00 mbr001
#> # ℹ 38 more rows
#> # ℹ 1 more variable: fcst <geolist>
#>
# Note the change in class
class(ens_point_df)
#> [1] "harp_ens_point_df" "harp_point_df" "harp_df"
#> [4] "tbl_df" "tbl" "data.frame"
class(pivot_members(ens_point_df))
#> [1] "harp_ens_point_df_long" "harp_point_df" "harp_df"
#> [4] "tbl_df" "tbl" "data.frame"
class(ens_grid_df)
#> [1] "harp_ens_grid_df" "harp_grid_df" "harp_df" "tbl_df"
#> [5] "tbl" "data.frame"
class(pivot_members(ens_grid_df))
#> [1] "harp_ens_grid_df_long" "harp_grid_df" "harp_df"
#> [4] "tbl_df" "tbl" "data.frame"