Read XRFF


Synopsis

This operator can read xrff files.


Description

This operator can read XRFF files known from Weka. The XRFF (eXtensible attribute-Relation File Format) is an XML-based extension of the ARFF format in some sense similar to the original RapidMiner file format for attribute description files (.aml).

Here you get a small example for the IRIS dataset represented as XRFF file:

<?xml version="1.0" encoding="utf-8"?> <dataset name="iris" version="3.5.3"> <header> <attributes> <attribute name="sepallength" type="numeric"/> <attribute name="sepalwidth" type="numeric"/> <attribute name="petallength" type="numeric"/> <attribute name="petalwidth" type="numeric"/> <attribute class="yes" name="class" type="nominal"> <labels> <label>Iris-setosa</label> <label>Iris-versicolor</label> <label>Iris-virginica</label> </labels> </attribute> </attributes> </header> <body> <instances> <instance> <value>5.1</value> <value>3.5</value> <value>1.4</value> <value>0.2</value> <value>Iris-setosa</value> </instance> <instance> <value>4.9</value> <value>3</value> <value>1.4</value> <value>0.2</value> <value>Iris-setosa</value> </instance> ... </instances> </body> </dataset>

Please note that the sparse XRFF format is currently not supported, please use one of the other options for sparse data files provided by RapidMiner.

Since the XML representation takes up considerably more space since the data is wrapped into XML tags, one can also compress the data via gzip. RapidMiner automatically recognizes a file being gzip compressed, if the file's extension is .xrff.gz instead of .xrff.

Similar to the native RapidMiner data definition via .aml and almost arbitrary data files, the XRFF format contains some additional features. Via the class="yes" attribute in the attribute specification in the header, one can define which attribute should used as a prediction label attribute. Although the RapidMiner terminus for such classes is "label" instead of "class" we support the terminus class in order to not break compatibility with original XRFF files.

Please note that loading attribute weights is currently not supported, please use the other RapidMiner operators for attribute weight loading and writing for this purpose.

Instance weights can be defined via a weight XML attribute in each instance tag. By default, the weight is 1. Here's an example:

<instance weight="0.75"> <value>5.1</value> <value>3.5</value> <value>1.4</value> <value>0.2</value> <value>Iris-setosa</value> </instance>

Since the XRFF format does not support id attributes one have to use one of the RapidMiner operators in order to change on of the columns to the id column if desired. This has to be done after loading the data.


Input


Output


Parameters


ExampleProcess