Synopsis
Parses the nominal values for the specified attribute with respect to the given date format string and transforms the values into date values.
Description
This operator parses given nominal attributes in order to create date and / or time attributes. The date format can be specified by the date_format parameter. The old nominal attribute will be removed and replaced by a new date attribute if the corresponding parameter keep_old_attribute is not set (default).
Date and Time Patterns
Date and time formats are specified by date and time pattern strings in the date_format parameter. Within date and time pattern strings, unquoted letters from 'A'
to 'Z'
and from 'a'
to 'z'
are interpreted as pattern letters representing the components of a date or time string. Text can be quoted using single quotes ('
) to avoid interpretation. "''"
represents a single quote. All other characters are not interpreted; they're simply copied into the output string during formatting or matched against the input string during parsing.
The following pattern letters are defined (all other characters from 'A'
to 'Z'
and from 'a'
to 'z'
are reserved):
- G: era designator; Text; example: AD
- y: year; Year; example: 1996; 96
- M: month in year; Month; example: July; Jul; 07
- w: week in year; Number; example: 27
- W: week in month; Number; example: 2
- D: day in year; Number; example: 189
- d: day in month; Number; example: 10
- F: day of week in month; Number; example: 2
- E: day in week; Text; example: Tuesday; Tue
- a: am/pm marker; Text; example: PM
- H: hour in day (0-23); Number; example: 0
- k: hour in day (1-24); Number; example: 24
- K: hour in am / pm (0-11); Number; example: 0
- h: hour in am / pm (1-12); Number; example: 12
- m: minute in hour; Number; example: 30
- s: second in minute; Number; example: 55
- S: millisecond; Number; example: 978
- z: time zone; General Time Zone; example: Pacific Standard Time; PST; GMT-08:00
- Z: time zone; RFC 822 Time Zone; example: -0800
Pattern letters are usually repeated, as their number determines the exact presentation:
- Text: For formatting, if the number of pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used if available. For parsing, both forms are accepted, independent of the number of pattern letters.
- Number: For formatting, the number of pattern letters is the minimum number of digits, and shorter numbers are zero-padded to this amount. For parsing, the number of pattern letters is ignored unless it's needed to separate two adjacent fields.
- Year: If the underlying calendar is the Gregorian calendar, the following rules are applied.
- For formatting, if the number of pattern letters is 2, the year is truncated to 2 digits; otherwise it is interpreted as a number.
- For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits. So using the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D.
- For parsing with the abbreviated year pattern ("y" or "yy"), this operator must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the operator is created. For example, using a pattern of "MM/dd/yy" and the operator created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964. During parsing, only strings consisting of exactly two digits will be parsed into the default century. Any other numeric string, such as a one digit string, a three or more digit string, or a two digit string that isn't all digits (for example, "-1"), is interpreted literally. So "01/02/3" or "01/02/003" are parsed, using the same pattern, as Jan 2, 3 AD. Likewise, "01/02/-3" is parsed as Jan 2, 4 BC.
Otherwise, calendar system specific forms are applied. If the number of pattern letters is 4 or more, a calendar specific long form is used. Otherwise, a calendar short or abbreviated form is used.
- Month: If the number of pattern letters is 3 or more, the month is interpreted as text; otherwise, it is interpreted as a number.
- General time zone: Time zones are interpreted as text if they have names. It is possible to define time zones by representing a GMT offset value. RFC 822 time zones are also accepted.
- RFC 822 time zone: For formatting, the RFC 822 4-digit time zone format is used. General time zones are also accepted.
This operator also supports localized date and time pattern strings by defining the locale parameter. In these strings, the pattern letters described above may be replaced with other, locale dependent, pattern letters.
Examples
The following examples show how date and time patterns are interpreted in the U.S. locale. The given date and time are 2001-07-04 12:08:56 local time in the U.S. Pacific Time time zone.
- "yyyy.MM.dd G 'at' HH:mm:ss z": 2001.07.04 AD at 12:08:56 PDT
- "EEE, MMM d, yy": Wed, Jul 4, '01
- "h:mm a": 12:08 PM
- "hh 'oclock' a, zzzz": 12 o'clock PM, Pacific Daylight Time
- "K:mm a, z": 0:08 PM, PDT
- "yyyy.MMMMM.dd GGG hh:mm aaa": 02001.July.04 AD 12:08 PM
- "EEE, d MMM yyyy HH:mm:ss Z": Wed, 4 Jul 2001 12:08:56 -0700
- "yyMMddHHmmssZ": 010704120856-0700
- "yyyy-MM-dd'T'HH:mm:ss.SSSZ": 2001-07-04T12:08:56.235-0700
Input
- example set input: expects: ExampleSetMetaData: #examples: = 0; #attributes: 0
, expects: ExampleSet
Output
- example set output:
- original:
Parameters
- attribute name: The attribute which should be parsed.
- date type: The desired value type for the parsed attribute.
- date format: The parse format of the date values, for example "yyyy/MM/dd".
- time zone: The time zone used for the date objects if not specified in the date string itself.
- locale: The used locale for date texts, for example "Wed" (English) in contrast to "Mi" (German).
- keep old attribute: Indicates if the original date attribute should be kept.