Database connectors

Edited

The database connector is used to extract data from databases that support Open Database Connectivity (ODBC).

This connector is not available by default. Please contact Opendatasoft for the activation of this connector on a given Opendatasoft workspace.

The databases that are supported by Opendatasoft are :

Database

Vendor resources

Supported database version

Snowflake

Web site

All

Denodo

Web site

>= 8.0 update 20220126

Databricks

Web site

Azure SQL

Web site

If you would like Opendatasoft to support other databases, do not hesitate to contact your local support team to share your feedbacks and use cases.

The database connector works at a database level. To configure a data source using this connector, you'll need to gather:

  • Connection details (like login and password, server adresse, database name)

  • SQL query to extract relevant data from your database

As this connector supports reusable connections, you'll have the ability to easily reuse connection details from one dataset to another dataset.

Creation

You can configure a source using the database connector by either creating a new dataset or adding a new source to an existing dataset.

  1. Select one of the supported database on the source creation screen.

  2. Configure connection details to connect Opendatasoft to your database. If you have already created and saved a database connection, you may want to reuse it.

  3. In Query, configure the custom SQL query to use to extract data from your database. This query can be arbitrarily complex.

  4. In Preview query [optional], configure the custom SQL query to use to extract data from your database in preview mode, when configuring the dataset in Opendatasoft back-office.

  5. In Incremental query [optional], configure the custom SQL query to use to extract data in incremental mode. See below for more details about the incremental extraction mode.

  6. In Incremental field [optional], configure the name of the query result field used to build the incremental update clause. See below for more details about the incremental extraction mode.

Connection details

Database

Connection details configuration

Snowflake

Configuration

Denodo

Configuration

Databricks

Configuration

Azure SQL

Configuration

Using a preview query for large result sets

It may happen that the query you configure will take ages to run and might considerably slow down (or even impair) the configuration screens in the Opendatasoft back-office.

Let's say for instance that you want to query a database table cities_in_the world table listing every city in the world. The database schema contains the fields city_name, city_country, and number_of_inhabitants.

The query you would use in the Query parameter would be:

SELECT city_name, city_country, number_of_inhabitants FROM cities_in_the_world;

However, for various reasons dependent on the database engine used behind the scene, this query may take ages to execute. In this case, use the Preview query parameter and set a lighter query that you know will be executed in a timely manner:

SELECT city_name, city_country, number_of_inhabitants FROM cities_in_the_world LIMIT 20;

Query and Preview query queries must output result sets with an identical table schema (list of fields and fields types).

Incremental update of a dataset with an database source

When a dataset with a database source is published again either manually or through scheduling, previous dataset content is cleared and the whole content of the database table is processed again. However, there are plenty of use cases where this approach is not suitable. You may for instance want to create a dataset from a database table that contains a daily temperature measure for specified locations.

The database table temperatures would contain the fields:

Field name

Field type

location

VARCHAR

date

DATE

temperature

FLOAT

You would then configure the following Query parameter:

SELECT location, date, temperature FROM temperatures;

You then configure the dataset so that it's updated on a daily basis thanks to Opendatasoft's internal scheduler. The first update fetches all historical data. And every other day, at update time, the dataset content is cleaned and the whole table content is fetched again. This is probably not what you expect as this approach is definitely not resource consumption savvy.

To minimize the volume of data synchronized at each update, you can set the connector to be in an incremental mode. To do so you should configure the Incremental query and Incremental field parameters.

The Incremental query parameter contains a templated query, used by the connector during the second and other update runs. It must contain a placeholder {INCR_CONDITION} that is automatically filled by the connector. For instance:

SELECT location, date, temperature FROM temperatures WHERE {INCR_CONDITION};

The Incremental field parameter contains a field name, whose value is used to build the {INCR_CONDITION} placeholder. Here, you would set the value date. As a consequence, the connector will scan the values of this field, keep the highest value in memory and use it for the next connector run.

Let's consider that the initial values in the database are:

location

date

temperature

Paris

2023-04-01

25

London

2023-04-01

22

Boston

2023-04-01

29

Paris

2023-04-02

24

London

2023-04-02

23

Boston

2023-04-02

22

For the initial processing, the database connector will run the SQL query set in the Query parameter:

SELECT location, date, temperature FROM temperatures;

It will thus fetch the 6 records above.

Let's say that the dataset is set to be updated on a daily basis. And that on 2023-04-03, 3 new records are added to the table:

location

date

temperature

Paris

2023-04-03

21

London

2023-04-03

19

Boston

2023-04-03

18

During the second processing, the database connector retrieves the highest value of the date field identified during the first processing (2023-04-02) and use it to rewrite the Incremental query:

SELECT location, date, temperature FROM temperatures WHERE date >= '2023-04-02';

This way, only the records from 2023-04-02 and 2023-04-02 will be fetched.

The supported types for the parameter Incremental field are: INT, DOUBLE, DATE, and DATETIME.

The database connector is using a greater or equals condition to make sure that records that have been added lately (after the previous connector run) are not missed.

Snowflake database

The connection details to fill in for a Snowflake database are described in the table below.

Parameter name

Description

Server

Snowflake account to connect to, under the form account_identifier.snowflakecomputing.com

User identifier

Snowflake user identifier

Password

Snowflake user password

Role

Snowflake role used by Opendatasoft when running queries (must be a valid role for the selected user)

Database

Name of the database to connect to

Schema

Database schema

Warehouse

Snowflake warehouse (compute resource) used to run queries performed by Opendatasoft

Denodo database

The connection details to fill in for a Denodo database are described in the table below.

Parameter name

Description

Server

Host name of the server that hosts your Denodo Virtual DataPort

Port

Port number of the server that hosts your Denodo Virtual DataPort

SSL required

Whether a SSL connection should be required (set only if SSL is configured on the Virtual DataPort server)

User identifier

Denodo user identifier

Password

Denodo user password

Database

Name of the database to connect to

Azure SQL database

The connection details to fill in for a Azure SQL database are described in the table below.

Parameter name

Description

Server

Host name of the server that hosts your Azure SQL Virtual DataPort

User identifier

Azure SQL user identifier

Database

Name of the database to connect to

Password

Azure SQL user password