Synopsis
This operator can read csv files.
Description
This operator can read csv files, where all values of an example are writen into one line and separated by an constant separator. The separator might be specified in the column separators parameter. The default will split the line on each comma, semicolon and blank. Arbitrary regular expressions are usable as separator. Empty values and the question mark will be read as missing values. You can quote the values (including the column separators) with a double quote ("). You can escape the quoting character with a backslash, i.e. \".
The first line is used for the attribute names as default, controlled by the use first row as attribute names parameter. This operator tries to determine an appropriate type of the attributes by reading the first few lines and checking the occuring values. If all values are integers, the attribute will become integer, if real numbers occur, it will be of type real. Columns containing values which can't be interpreted as numbers will be nominal, as long as they don't match the date and time pattern of the date format parameter. If they do, this column of the csv file will be automatically parsed as date and the according attribute will be of type date.
Input
Output
- output:
Parameters
- configure operator: Configure this operator by means of a Wizard.
- file name: Name of the file to read the data from.
- encoding: The encoding used for reading or writing files.
- trim lines: Indicates if lines should be trimmed (empty spaces are removed at the beginning and the end) before the column split is performed. This option might be problematic if TABs are used as a seperator.
- skip comments: Indicates if a comment character should be used.
- comment characters: Lines beginning with these characters are ignored.
- use first row as attribute names: Read attribute names from file (assumes the attribute names are in the first line of the file).
- use quotes: Indicates if quotes should be regarded.
- quotes character: The quotes character.
- escape character for quotes: The charcter that is used to escape quotes
- column separators: Column separators for data files (regular expression)
- parse numbers: Specifies whether numbers are parsed.
- decimal character: The decimal character.
- grouped digits: Parse grouped digits.
- grouping character: The grouping character.
- date format: The format pattern of date values.
- read not matching values as missings: Values which does not match to the specified value typed are considered as missings.
- data set meta data information: The meta data information
- attribute names already defined: the parameter describes whether the attribute names were set by the user manually or were generated by the the reader (generic names or first row of the file)