parse_date_time {lubridate} | R Documentation |
parse_date_time
parses an input vector into POSIXct date-time
object. It differs from strptime
in two respects. First,
it allows specification of the order in which the formats occur without the
need to include separators and "%" prefix. Such a formating argument is
refered to as "order". Second, it allows the user to specify several
format-orders to handle heterogeneous date-time character
representations.
parse_date_time2
is a fast C parser of numeric
orders.
fast_strptime
is a fast C parser of numeric formats only
that accepts explicit format arguments, just as
strptime
.
parse_date_time(x, orders, tz = "UTC", truncated = 0, quiet = FALSE, locale = Sys.getlocale("LC_TIME"), select_formats = .select_formats, exact = FALSE) parse_date_time2(x, orders, tz = "UTC", exact = FALSE, lt = FALSE) fast_strptime(x, format, tz = "UTC", lt = TRUE)
x |
a character or numeric vector of dates |
orders |
a character vector of date-time formats. Each order string is
series of formatting characters as listed |
tz |
a character string that specifies the time zone with which to parse the dates |
truncated |
integer, number of formats that can be missing. The most
common type of irregularity in date-time data is the truncation due to
rounding or unavailability of the time stamp. If |
quiet |
logical. When TRUE progress messages are not printed, and
"no formats found" error is surpresed and the function simply returns a
vector of NAs. This mirrors the behavior of base R functions
|
locale |
locale to be used, see locales. On linux systems you
can use |
select_formats |
A function to select actual formats for parsing from a
set of formats which matched a training subset of |
exact |
logical. If |
lt |
logical. If TRUE returned object is of class POSIXlt, and POSIXct otherwise. For compatibility with base 'strptime' function default is TRUE for 'fast_strptime' and FALSE for 'parse_date_time2'. |
format |
a character string of formats. It should include all the
separators and each format must be prefixed with
argument of |
When several format-orders are specified parse_date_time
sorts the
supplied format-orders based on a training set and then applies them
recursively on the input vector.
parse_date_time
, and all derived functions, such as ymd_hms
,
ymd
etc, will drop into fast_strptime
instead of
strptime
whenever the guesed from the input data formats are all
numeric.
The list below contains formats recognized by lubridate. For numeric formats
leading 0s are optional. In contrast to strptime
, some of the formats
have been extended for efficiency reasons. They are marked with "*". Fast
perasers, parse_date_time2
and fast_strptime
, currently
accept only formats marked with "!".
a
Abbreviated weekday name in the current locale. (Also matches full name)
A
Full weekday name in the current locale. (Also matches abbreviated name).
You need not specify a
and A
formats explicitly. Wday is
automatically handled if preproc_wday = TRUE
b
Abbreviated month name in the current locale. (Also matches full name.)
B
Full month name in the current locale. (Also matches abbreviated name.)
d
!Day of the month as decimal number (01–31 or 0–31)
H
!Hours as decimal number (00–24 or 0–24).
I
Hours as decimal number (01–12 or 1–12).
j
Day of year as decimal number (001–366 or 1–366).
m
!*Month as decimal number (01–12 or 1–12). For
parse_date_time
, also matches abbreviated and full months names as
b
and B
formats.
M
!Minute as decimal number (00–59 or 0–59).
p
AM/PM indicator in the locale. Used in
conjunction with I
and not with H
. An
empty string in some locales.
S
!Second as decimal number (00–61 or 0–61), allowing for up to two leap-seconds (but POSIX-compliant implementations will ignore leap seconds).
OS
Fractional second.
U
Week of the year as decimal number (00–53 or 0-53) using Sunday as the first day 1 of the week (and typically with the first Sunday of the year as day 1 of week 1). The US convention.
w
Weekday as decimal number (0–6, Sunday is 0).
W
Week of the year as decimal number (00–53 or 0-53) using Monday as the first day of week (and typically with the first Monday of the year as day 1 of week 1). The UK convention.
y
!*Year without century (00–99 or 0–99). In
parse_date_time
also matches year with century (Y format).
Y
!Year with century.
z
!*ISO8601 signed offset in hours and minutes from UTC. For
example -0800
, -08:00
or -08
, all represent 8 hours
behind UTC. This format also matches the Z (Zulu) UTC indicator. Because
strptime doesn't fully support ISO8601 this format is implemented as an
union of 4 orders: Ou (Z), Oz (-0800), OO (-08:00) and Oo (-08). You can use
these four orders as any other but it is rarely
necessary. parse_date_time2
and fast_strptime
support all of
the timezone formats.
r
*Matches Ip
and H
orders.
R
*Matches HM
andIMp
orders.
T
*Matches IMSp
, HMS
, and HMOS
orders.
a vector of POSIXct date-time objects
parse_date_time
(and the derivatives ymb
, ymd_hms
etc) rely on a sparse guesser that takes at most 501 elements from the
supplied character vector in order to identify appropriate formats from
the supplied orders. If you get the error All formats failed to
parse
and you are confident that your vector contains valid dates, you
should either set exact
argument to TRUE or use functions that
don't perform format guessing (fast_strptime
,
parse_date_time2
or strptime
).
For performance reasons, when timezone is not UTC,
parse_date_time2
and fast_strptime
perform no validity
checks for daylight savings time. Thus, if your input string contains an
invalid date time which falls into DST gap and lt=TRUE
you will get
an POSIXlt
object with a non-existen time. If lt=FALSE
your
time instant will be adjusted to a valid time by adding an hour. See
examples. If you want to get NA for invalid date-times use
fit_to_timeline
explicitely.
## ** orders are much easier to write ** x <- c("09-01-01", "09-01-02", "09-01-03") parse_date_time(x, "ymd") parse_date_time(x, "y m d") parse_date_time(x, "%y%m%d") # "2009-01-01 UTC" "2009-01-02 UTC" "2009-01-03 UTC" ## ** heterogenuous date-times ** x <- c("09-01-01", "090102", "09-01 03", "09-01-03 12:02") parse_date_time(x, c("ymd", "ymd HM")) ## ** different ymd orders ** x <- c("2009-01-01", "02022010", "02-02-2010") parse_date_time(x, c("dmY", "ymd")) ## "2009-01-01 UTC" "2010-02-02 UTC" "2010-02-02 UTC" ## ** truncated time-dates ** x <- c("2011-12-31 12:59:59", "2010-01-01 12:11", "2010-01-01 12", "2010-01-01") parse_date_time(x, "Ymd HMS", truncated = 3) parse_date_time(x, "ymd_hms", truncated = 3) ## [1] "2011-12-31 12:59:59 UTC" "2010-01-01 12:11:00 UTC" ## [3] "2010-01-01 12:00:00 UTC" "2010-01-01 00:00:00 UTC" ## ** specifying exact formats and avoiding training and guessing ** parse_date_time(x, c("%m-%d-%y", "%m%d%y", "%m-%d-%y %H:%M"), exact = TRUE) ## [1] "2001-09-01 00:00:00 UTC" "2002-09-01 00:00:00 UTC" NA "2003-09-01 12:02:00 UT parse_date_time(c('12/17/1996 04:00:00','4/18/1950 0130'), c('%m/%d/%Y %I:%M:%S','%m/%d/%Y %H%M'), exact = TRUE) ## [1] "1996-12-17 04:00:00 UTC" "1950-04-18 01:30:00 UTC" ## ** fast parsing ** ## Not run: options(digits.secs = 3) ## random times between 1400 and 3000 tt <- as.character(.POSIXct(runif(1000, -17987443200, 32503680000))) tt <- rep.int(tt, 1000) system.time(out <- as.POSIXct(tt, tz = "UTC")) system.time(out1 <- ymd_hms(tt)) # constant overhead on long vectors system.time(out2 <- parse_date_time2(tt, "YmdHMOS")) system.time(out3 <- fast_strptime(tt, "%Y-%m-%d %H:%M:%OS")) all.equal(out, out1) all.equal(out, out2) all.equal(out, out3) ## End(Not run) ## ** how to use `select_formats` argument ** ## By default %Y has precedence: parse_date_time(c("27-09-13", "27-09-2013"), "dmy") ## [1] "13-09-27 UTC" "2013-09-27 UTC" ## to give priority to %y format, define your own select_format function: my_select <- function(trained){ n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y", names(trained))*1.5 names(trained[ which.max(n_fmts) ]) } parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select) ## [1] "2013-09-27 UTC" "2013-09-27 UTC" ## ** invalid times with "fast" parcing ** parse_date_time("2010-03-14 02:05:06", "YmdHMS", tz = "America/New_York") ## [1] NA parse_date_time2("2010-03-14 02:05:06", "YmdHMS", tz = "America/New_York") ## [1] "2010-03-14 03:05:06 EDT" parse_date_time2("2010-03-14 02:05:06", "YmdHMS", tz = "America/New_York", lt = TRUE) ## [1] "2010-03-14 02:05:06 America/New_York"