Synopsis
This operator reads an example set from an SQL database.
Description
This operator reads an ExampleSet from an SQL database. The SQL query can be auto generated giving a table name, passed to RapidMiner via a parameter or, in case of long SQL statements, in a separate file. The desired behavior can be chosen using the define query parameter. Please note that column names are often case sensitive and might need quoting. Databases may behave differently here.
Before acutally sending a statement to a database, you need to create a database connection. You might either choose Manage Database Connections... from the Tools menu in the main window, or click on the button on the right of the drop down box of the connection parameter. The window showing up asks for several details as Host, Port, Database system, schema and username. The Test button will allow you to check wheter the connection can be made. After saving a new connection, it might be chosen from the drop down box of the connection parameter.
When the operator is executed, the table delivered by the query will be copied into the memory of your computer. This will give all subsequent operators a fast access on the data so that even learning schemes like the Support Vector Machine with their high number of random accesses will run fast. If the table is too big for your main memory, you might use the Stream Database operator. It will hold only a part of the table in memory for the cost of several magnitudes slower access if the desired example isn't cached.
Warning
As the java ResultSetMetaData
interface does not provide information about the possible values of nominal attributes, the internal indices the nominal values are mapped to will depend on the ordering they appear in the table. This may cause problems only when processes are split up into a training process and an application or testing process. For learning schemes which are capable of handling nominal attributes, this is not a problem. If a learning scheme like a SVM is used with nominal data, RapidMiner pretends that nominal attributes are numerical and uses indices for the nominal values as their numerical value. A SVM may perform well if there are only two possible values. If a test set is read in another process, the nominal values may be assigned different indices, and hence the SVM trained is useless. This is not a problem for label attributes, since the classes can be specified using the classes
parameter and hence, all learning schemes intended to use with nominal data are safe to use.
You migth avoid this problem, if you first combine both ExampleSets using the Append operator and then split it again using two Filter Examples operators.
Input
Output
- output:
Parameters
- read not matching values as missings: Values which does not match to the specified value typed are considered as missings.
- data set meta data information: The meta data information
- attribute names already defined: the parameter describes whether the attribute names were set by the user manually or were generated by the the reader (generic names or first row of the file)
- define connection: Indicates how the database connection should be specified.
- connection: A predefined database connection.
- database system: The used database system.
- database url: The URL connection string for the database, e.g. 'jdbc:mysql://foo.bar:portnr/database'
- username: The database username.
- password: The password for the database.
- jndi name: JNDI name for a data source.
- define query: Specifies whether the database query should be defined directly, through a file or implicitely by a given table name.
- query: An SQL query.
- query file: A file containing an SQL query.
- table name: A database table.