SQL databases¶
Integrating a SQL database is quite simple. You need to declare a data store of the type of the database you want to integrate
Each database have specificities that are described at their pages :
The rest of this page details generalities applicable to most of the databases.
General parameters¶
All those parameters can be set as XML attributes of the datastore element or as configuration with dw.datastore.<datastore_name>.<parameter>.
Parameters |
Details |
---|---|
name |
how you want this data store be refered as |
type |
depend on the type of database (mysql for MySQL and postgresql for PostgreSQL) |
host |
IP or DNS of the database location |
database |
Which database to connect to |
user |
user of the database to use |
password |
password of the database |
sshUser |
(optionnal) If specified, DataFactory will establish a SSH tunnel with this user name |
sshPassword |
(optionnal) If there is a sshUser, Datafactory will use this as the ssh password. If there is a sshUser but no sshPassword DataFactory will try to use its ssh private key |
sshHost |
(optionnal) If specified, the SSH tunnel host If sshUser is set but not sshHost, host will be the sshHost and it will connect to localhost |
sshPrivateKeyLocation |
Location of the private key file |
sshPrivateKeyFilePassphrase |
Optional passphrase for the private key |
extendedConnectionParameters |
Connection parameters to add to defaults ones (see each database description) |
connectionParameters |
Connection parameters to use, discards default ones (see each database description) |
In case you want to use SSH tunneling but not use the password authentification. you need to add your public key to the ~/.ssh/authorized_keys file on host for the sshUser.
Autodiscovery of metadata¶
As the database already contains all the needed informations about tables and columns, it is not mandatory to detail them. There is 3 ways to describe a SQL datastore :
All informations¶
Like any datastore, you can provide all tables and columns informations. This is convenient to either :
ensure the database layout (safety)
is a bit faster (no metadata search)
allow to use only some columns of tables (like not using columns with sensitive data or avoiding useless columns)
But obviously it is more time consuming and error prone to detail everything.
An exemple can be found below :
<datastore name="db1" type="postgresql" host="db.theowner.com" database="database" user="john" password="Doe">
<table name="cutsomers" schema="public">
<column name="id" type="int"/>
<column name="name" type="text"/>
</table>
</datastore>
Just table names¶
Another way is to specify only table names. Data Brewery will automatically ask for the metadata as needed. In this case, it will take all commun with the most adequate column type.
The previous example will be simplified as :
<datastore name="db2" type="postgresql" host="db.theowner.com" database="database" user="john" password="Doe">
<table name="cutsomers" schema="public"/>
</datastore>
Auto discovery of a schema¶
The laziest way to add a SQL datastore is to use the autoDiscovery element to automatically add every tables of a given schema.
The example below will add all tables in the public and my_schema schemas to the datastore. If there is a conflict (two tables with the same name), the first one will be used.
<datastore name="db3" type="postgresql" host="db.theowner.com" database="database" user="john" password="Doe">
<autoDiscovery schema="public"/>
<autoDiscovery schema="my_schema"/>
</datastore>
About schemas¶
Most databases use schemas to organize data. Some databases (Mysql for instance) mix the database and schema concepts. Tables can have the same name but stay in different schemas. That’s why you should specify the schema when describing a table in a SQL datastore.
While not recommended, schema can be left empty (by not defining it or by setting it to “”). In such case, Data Brewery will use the search path of the database (which might differ for each database).
Using SSH tunnels¶
In order to secure a database, it is sometimes not accessible from the network. In such case, the client must first establish a SSH tunnel to the server than connect locally to the database.
Data Brewery support such scenario with ssh parameters.
A simple example is to provide a sshUser and a sshPassword as below :
<datastore name="db1" type="postgresql" host="db.theowner.com" database="database"
user="john" password="Doe"
sshUser="john" sshPassword="Doe"/>
A more secured way to connect is to use a public/private keys pair (see here for how to set a public/private keys pair). The sshPrivateKeyFilePassphrase is optionnal depending if you have set a passphrase for your key.
<datastore name="db1" type="postgresql" host="db.theowner.com" database="database"
user="john" password="Doe"
sshUser="john"
sshPrivateKeyLocation="keys/private.key"
sshPrivateKeyFilePassphrase="Doe"
/>
Using connection parameters¶
We use JDBC to connect to databases and you can add some configuration to it. Some datastore type already add some paramaters (see each datastore documentation). You can either add more paramaters with the parameter extendedConnectionParameters or reset the whole parameters list with connectionParameters.
Each parameter is separated by &. As it is an unautorized character, you should use the escaped version & instead. For instance :
<datastore name="prestashop" type="mysql"
connectionParameters="serverTimezone=UTC&useSSL=false"/>