Database connectors

The database connector is used to extract data from databases that support Open Database Connectivity (ODBC).

This connector is not available by default. Please contact Opendatasoft for the activation of this connector on a given Opendatasoft workspace.

The databases that are supported by Opendatasoft are :

Database	Vendor resources	Supported database version
Snowflake	Web site	All
Denodo	Web site	>= 8.0 update 20220126
Databricks	Web site
Azure SQL	Web site

If you would like Opendatasoft to support other databases, do not hesitate to contact your local support team to share your feedbacks and use cases.

The database connector works at a database level. To configure a data source using this connector, you'll need to gather:

Connection details (like login and password, server adresse, database name)
SQL query to extract relevant data from your database

As this connector supports reusable connections, you'll have the ability to easily reuse connection details from one dataset to another dataset.

Creation

You can configure a source using the database connector by either creating a new dataset or adding a new source to an existing dataset.

Select one of the supported database on the source creation screen.
Configure connection details to connect Opendatasoft to your database. If you have already created and saved a database connection, you may want to reuse it.
In Query, configure the custom SQL query to use to extract data from your database. This query can be arbitrarily complex.
In Preview query [optional], configure the custom SQL query to use to extract data from your database in preview mode, when configuring the dataset in Opendatasoft back-office.
In Incremental query [optional], configure the custom SQL query to use to extract data in incremental mode. See below for more details about the incremental extraction mode.
In Incremental field [optional], configure the name of the query result field used to build the incremental update clause. See below for more details about the incremental extraction mode.

Connection details

Database	Connection details configuration
Snowflake	Configuration
Denodo	Configuration
Databricks	Configuration
Azure SQL	Configuration

Using a preview query for large result sets

It may happen that the query you configure will take ages to run and might considerably slow down (or even impair) the configuration screens in the Opendatasoft back-office.

Let's say for instance that you want to query a database table cities_in_the world table listing every city in the world. The database schema contains the fields city_name, city_country, and number_of_inhabitants.

The query you would use in the Query parameter would be:

SELECT city_name, city_country, number_of_inhabitants FROM cities_in_the_world;

However, for various reasons dependent on the database engine used behind the scene, this query may take ages to execute. In this case, use the Preview query parameter and set a lighter query that you know will be executed in a timely manner:

SELECT city_name, city_country, number_of_inhabitants FROM cities_in_the_world LIMIT 20;

Query and Preview query queries must output result sets with an identical table schema (list of fields and fields types).

Incremental update of a dataset with an database source

When a dataset with a database source is published again either manually or through scheduling, previous dataset content is cleared and the whole content of the database table is processed again. However, there are plenty of use cases where this approach is not suitable. You may for instance want to create a dataset from a database table that contains a daily temperature measure for specified locations.

The database table temperatures would contain the fields:

Field name	Field type
`location`	VARCHAR
`date`	DATE
`temperature`	FLOAT

You would then configure the following Query parameter:

SELECT location, date, temperature FROM temperatures;

You then configure the dataset so that it's updated on a daily basis thanks to Opendatasoft's internal scheduler. The first update fetches all historical data. And every other day, at update time, the dataset content is cleaned and the whole table content is fetched again. This is probably not what you expect as this approach is definitely not resource consumption savvy.

To minimize the volume of data synchronized at each update, you can set the connector to be in an incremental mode. To do so you should configure the Incremental query and Incremental field parameters.

The Incremental query parameter contains a templated query, used by the connector during the second and other update runs. It must contain a placeholder {INCR_CONDITION} that is automatically filled by the connector. For instance:

SELECT location, date, temperature FROM temperatures WHERE {INCR_CONDITION};

The Incremental field parameter contains a field name, whose value is used to build the {INCR_CONDITION} placeholder. Here, you would set the value date. As a consequence, the connector will scan the values of this field, keep the highest value in memory and use it for the next connector run.

Let's consider that the initial values in the database are:

location	date	temperature
Paris	2023-04-01	25
London	2023-04-01	22
Boston	2023-04-01	29
Paris	2023-04-02	24
London	2023-04-02	23
Boston	2023-04-02	22

For the initial processing, the database connector will run the SQL query set in the Query parameter:

SELECT location, date, temperature FROM temperatures;

It will thus fetch the 6 records above.

Let's say that the dataset is set to be updated on a daily basis. And that on 2023-04-03, 3 new records are added to the table:

location	date	temperature
Paris	2023-04-03	21
London	2023-04-03	19
Boston	2023-04-03	18

During the second processing, the database connector retrieves the highest value of the date field identified during the first processing (2023-04-02) and use it to rewrite the Incremental query:

SELECT location, date, temperature FROM temperatures WHERE date >= '2023-04-02';

This way, only the records from 2023-04-02 and 2023-04-02 will be fetched.

The supported types for the parameter Incremental field are: INT, DOUBLE, DATE, and DATETIME.

The database connector is using a greater or equals condition to make sure that records that have been added lately (after the previous connector run) are not missed.

Snowflake database

The connection details to fill in for a Snowflake database are described in the table below.

Parameter name	Description
Server	Snowflake account to connect to, under the form `account_identifier`.snowflakecomputing.com
User identifier	Snowflake user identifier
Password	Snowflake user password
Role	Snowflake role used by Opendatasoft when running queries (must be a valid role for the selected user)
Database	Name of the database to connect to
Schema	Database schema
Warehouse	Snowflake warehouse (compute resource) used to run queries performed by Opendatasoft

Denodo database

The connection details to fill in for a Denodo database are described in the table below.

Parameter name	Description
Server	Host name of the server that hosts your Denodo Virtual DataPort
Port	Port number of the server that hosts your Denodo Virtual DataPort
SSL required	Whether a SSL connection should be required (set only if SSL is configured on the Virtual DataPort server)
User identifier	Denodo user identifier
Password	Denodo user password
Database	Name of the database to connect to

Azure SQL database

The connection details to fill in for a Azure SQL database are described in the table below.

Parameter name	Description
Server	Host name of the server that hosts your Azure SQL Virtual DataPort
User identifier	Azure SQL user identifier
Database	Name of the database to connect to
Password	Azure SQL user password