In this example, we start with a data frame with two columns, one with full dates and one with partial dates. The goal is to consolidate these dates into one ISO 8601 formatted date column.
library(sdtmval)
library(dplyr)
<- data.frame(
raw_dates raw_full = c(
rep(NA, 8),
"02/05/2017",
"02-05-2017"
),raw_partial = c(
"UN-UNK-UNKN",
"UN/UNK/UNKN",
"UN UNK UNKN",
"UN-UNK-2017",
"UN-Feb-2017",
"05-FEB-2017",
"05-UNK-2017",
"05-Feb-UNKN",
rep(NA, 2)
)
)::kable(raw_dates) knitr
raw_full | raw_partial |
---|---|
NA | UN-UNK-UNKN |
NA | UN/UNK/UNKN |
NA | UN UNK UNKN |
NA | UN-UNK-2017 |
NA | UN-Feb-2017 |
NA | 05-FEB-2017 |
NA | 05-UNK-2017 |
NA | 05-Feb-UNKN |
02/05/2017 | NA |
02-05-2017 | NA |
First, we will re-arrange the partial dates into the same format as
the full dates using reshape_pdates()
. That will let us
combine the full and partial dates into one column with a MM/DD/YYYY
format. Then, using reshape_adates()
, we will convert all
dates to the YYYY-MM-DD format.
<- raw_dates %>%
working_dates mutate(
partial = reshape_pdates(raw_partial),
all = coalesce(raw_full, partial),
all = reshape_adates(all)
)::kable(working_dates) knitr
raw_full | raw_partial | partial | all |
---|---|---|---|
NA | UN-UNK-UNKN | UN/UN/UNKN | UNKN-UN-UN |
NA | UN/UNK/UNKN | UN/UN/UNKN | UNKN-UN-UN |
NA | UN UNK UNKN | UN/UN/UNKN | UNKN-UN-UN |
NA | UN-UNK-2017 | UN/UN/2017 | 2017-UN-UN |
NA | UN-Feb-2017 | 02/UN/2017 | 2017-02-UN |
NA | 05-FEB-2017 | 02/05/2017 | 2017-02-05 |
NA | 05-UNK-2017 | UN/05/2017 | 2017-UN-05 |
NA | 05-Feb-UNKN | 02/05/UNKN | UNKN-02-05 |
02/05/2017 | NA | NA | 2017-02-05 |
02-05-2017 | NA | NA | 2017-02-05 |
For situations where missing date elements should be removed, use the
trim_dates()
function.
<- mutate(working_dates, trimmed = trim_dates(all))
trimmed_dates ::kable(trimmed_dates) knitr
raw_full | raw_partial | partial | all | trimmed |
---|---|---|---|---|
NA | UN-UNK-UNKN | UN/UN/UNKN | UNKN-UN-UN | NA |
NA | UN/UNK/UNKN | UN/UN/UNKN | UNKN-UN-UN | NA |
NA | UN UNK UNKN | UN/UN/UNKN | UNKN-UN-UN | NA |
NA | UN-UNK-2017 | UN/UN/2017 | 2017-UN-UN | 2017 |
NA | UN-Feb-2017 | 02/UN/2017 | 2017-02-UN | 2017-02 |
NA | 05-FEB-2017 | 02/05/2017 | 2017-02-05 | 2017-02-05 |
NA | 05-UNK-2017 | UN/05/2017 | 2017-UN-05 | 2017 |
NA | 05-Feb-UNKN | 02/05/UNKN | UNKN-02-05 | NA |
02/05/2017 | NA | NA | 2017-02-05 | 2017-02-05 |
02-05-2017 | NA | NA | 2017-02-05 | 2017-02-05 |
If imputed dates are needed, use the impute_pdates()
function. Both start and end dates can be imputed using standard
imputation rules.
<- working_dates %>%
imputed_dates mutate(
start = impute_pdates(all, ptype = "start"),
end = impute_pdates(all, ptype = "end")
)::kable(imputed_dates) knitr
raw_full | raw_partial | partial | all | start | end |
---|---|---|---|---|---|
NA | UN-UNK-UNKN | UN/UN/UNKN | UNKN-UN-UN | NA | NA |
NA | UN/UNK/UNKN | UN/UN/UNKN | UNKN-UN-UN | NA | NA |
NA | UN UNK UNKN | UN/UN/UNKN | UNKN-UN-UN | NA | NA |
NA | UN-UNK-2017 | UN/UN/2017 | 2017-UN-UN | 2017-01-01 | 2017-12-31 |
NA | UN-Feb-2017 | 02/UN/2017 | 2017-02-UN | 2017-02-01 | 2017-02-28 |
NA | 05-FEB-2017 | 02/05/2017 | 2017-02-05 | 2017-02-05 | 2017-02-05 |
NA | 05-UNK-2017 | UN/05/2017 | 2017-UN-05 | 2017-01-05 | 2017-12-05 |
NA | 05-Feb-UNKN | 02/05/UNKN | UNKN-02-05 | NA | NA |
02/05/2017 | NA | NA | 2017-02-05 | 2017-02-05 | 2017-02-05 |
02-05-2017 | NA | NA | 2017-02-05 | 2017-02-05 | 2017-02-05 |