Database connectors
The database connector is used to extract data from databases that support Open Database Connectivity (ODBC).
This connector is not available by default. Please contact Opendatasoft for the activation of this connector on a given Opendatasoft workspace.
The databases that are supported by Opendatasoft are :
Database | Vendor resources | Supported database version |
All | ||
>= 8.0 update 20220126 | ||
If you would like Opendatasoft to support other databases, do not hesitate to contact your local support team to share your feedbacks and use cases.
The database connector works at a database level. To configure a data source using this connector, you'll need to gather:
Connection details (like login and password, server adresse, database name)
SQL query to extract relevant data from your database
As this connector supports reusable connections, you'll have the ability to easily reuse connection details from one dataset to another dataset.
Creation
You can configure a source using the database connector by either creating a new dataset or adding a new source to an existing dataset.
Select one of the supported database on the source creation screen.
Configure connection details to connect Opendatasoft to your database. If you have already created and saved a database connection, you may want to reuse it.
In Query, configure the custom SQL query to use to extract data from your database. This query can be arbitrarily complex.
In Preview query [optional], configure the custom SQL query to use to extract data from your database in preview mode, when configuring the dataset in Opendatasoft back-office.
In Incremental query [optional], configure the custom SQL query to use to extract data in incremental mode. See below for more details about the incremental extraction mode.
In Incremental field [optional], configure the name of the query result field used to build the incremental update clause. See below for more details about the incremental extraction mode.
Connection details
Database | Connection details configuration |
Snowflake | |
Denodo | |
Databricks | |
Azure SQL |
Using a preview query for large result sets
It may happen that the query you configure will take ages to run and might considerably slow down (or even impair) the configuration screens in the Opendatasoft back-office.
Let's say for instance that you want to query a database table cities_in_the world
table listing every city in the world. The database schema contains the fields city_name
, city_country
, and number_of_inhabitants
.
The query you would use in the Query parameter would be:
SELECT city_name, city_country, number_of_inhabitants FROM cities_in_the_world;
However, for various reasons dependent on the database engine used behind the scene, this query may take ages to execute. In this case, use the Preview query parameter and set a lighter query that you know will be executed in a timely manner:
SELECT city_name, city_country, number_of_inhabitants FROM cities_in_the_world LIMIT 20;
Query and Preview query queries must output result sets with an identical table schema (list of fields and fields types).
Incremental update of a dataset with an database source
When a dataset with a database source is published again either manually or through scheduling, previous dataset content is cleared and the whole content of the database table is processed again. However, there are plenty of use cases where this approach is not suitable. You may for instance want to create a dataset from a database table that contains a daily temperature measure for specified locations.
The database table temperatures
would contain the fields:
Field name | Field type |
| VARCHAR |
| DATE |
| FLOAT |
You would then configure the following Query parameter:
SELECT location, date, temperature FROM temperatures;
You then configure the dataset so that it's updated on a daily basis thanks to Opendatasoft's internal scheduler. The first update fetches all historical data. And every other day, at update time, the dataset content is cleaned and the whole table content is fetched again. This is probably not what you expect as this approach is definitely not resource consumption savvy.
To minimize the volume of data synchronized at each update, you can set the connector to be in an incremental mode. To do so you should configure the Incremental query and Incremental field parameters.
The Incremental query parameter contains a templated query, used by the connector during the second and other update runs. It must contain a placeholder {INCR_CONDITION}
that is automatically filled by the connector. For instance:
SELECT location, date, temperature FROM temperatures WHERE {INCR_CONDITION};
The Incremental field parameter contains a field name, whose value is used to build the {INCR_CONDITION}
placeholder. Here, you would set the value date
. As a consequence, the connector will scan the values of this field, keep the highest value in memory and use it for the next connector run.
Let's consider that the initial values in the database are:
location | date | temperature |
Paris | 2023-04-01 | 25 |
London | 2023-04-01 | 22 |
Boston | 2023-04-01 | 29 |
Paris | 2023-04-02 | 24 |
London | 2023-04-02 | 23 |
Boston | 2023-04-02 | 22 |
For the initial processing, the database connector will run the SQL query set in the Query parameter:
SELECT location, date, temperature FROM temperatures;
It will thus fetch the 6 records above.
Let's say that the dataset is set to be updated on a daily basis. And that on 2023-04-03, 3 new records are added to the table:
location | date | temperature |
Paris | 2023-04-03 | 21 |
London | 2023-04-03 | 19 |
Boston | 2023-04-03 | 18 |
During the second processing, the database connector retrieves the highest value of the date
field identified during the first processing (2023-04-02) and use it to rewrite the Incremental query:
SELECT location, date, temperature FROM temperatures WHERE date >= '2023-04-02';
This way, only the records from 2023-04-02 and 2023-04-02 will be fetched.
The supported types for the parameter Incremental field are: INT, DOUBLE, DATE, and DATETIME.
The database connector is using a greater or equals condition to make sure that records that have been added lately (after the previous connector run) are not missed.
Snowflake database
The connection details to fill in for a Snowflake database are described in the table below.
Parameter name | Description |
Server | Snowflake account to connect to, under the form |
User identifier | Snowflake user identifier |
Password | Snowflake user password |
Role | Snowflake role used by Opendatasoft when running queries (must be a valid role for the selected user) |
Database | Name of the database to connect to |
Schema | Database schema |
Warehouse | Snowflake warehouse (compute resource) used to run queries performed by Opendatasoft |
Denodo database
The connection details to fill in for a Denodo database are described in the table below.
Parameter name | Description |
Server | Host name of the server that hosts your Denodo Virtual DataPort |
Port | Port number of the server that hosts your Denodo Virtual DataPort |
SSL required | Whether a SSL connection should be required (set only if SSL is configured on the Virtual DataPort server) |
User identifier | Denodo user identifier |
Password | Denodo user password |
Database | Name of the database to connect to |
Azure SQL database
The connection details to fill in for a Azure SQL database are described in the table below.
Parameter name | Description |
Server | Host name of the server that hosts your Azure SQL Virtual DataPort |
User identifier | Azure SQL user identifier |
Database | Name of the database to connect to |
Password | Azure SQL user password |