Databricks connector

Edited

This connector is not available by default. Please contact Opendatasoft for the activation of this connector on a given Opendatasoft domain.

This article assumes you know the basics about how to use Databricks. If you do not, please refer to Databricks' documentation.

This connector allows you to use SQL to query your Databricks instance through either a SQL Warehouse or a Cluster.

To configure the Databricks connector, you will first need to obtain information found in Databricks. This will be on the configuration panel of either the warehouse or the cluster, depending on which you intend to connect to. With that information in hand, you can complete the connector's configuration.

Connector fields

Here are the fields of the connector configuration:

Field name

Required/optional

Host

Required

Port

Required

HTTP path

Required

Personal access token

Required

Catalog

Required

Schema

Optional

Follow these steps to connect your data portal to Databricks:

Connect to a Databricks warehouse

Below the first two steps are about gathering information necessary for Opendatasoft to connect to Databricks. The final two steps are about defining the data you want to query.

1. Retrieve Host, Port and HTTP path fields

Go to your Databricks Warehouse Connection details tab:

There, you need to copy the following:

  • Host

  • Port (443 by default)

  • HTTP path

2. Retrieve a personal access token

To allow Opendatasoft to communicate with your Databricks SQL warehouse, you need to provide a working personal access token. It can be created directly on the Databricks workspace.

3. Retrieve catalog and schema names

To find the catalog and schema, go to the Catalog Explorer page (click Catalog in the sidebar) and open the catalog you want to query.

4. Make the SQL query

You need to fill in the query fields using the catalog and schema names. Here is documentation that discusses each of those fields.

If you have decided not to specify the schema in the configuration, you will need to add it to the SQL query itself.For instance, if I chose to use the samples catalog but I did not specify that I wanted to use its nyctaxi schema to query the trips table, the query would look like: SELECT * FROM nyctaxi.trips

(Optional) Connect to a Cluster

You need to have either a Personal Compute Cluster or a Shared one.

If you go to your Cluster's configuration page > Advanced option > JDBC/ODBC, you'll see all the connection details you need to use to connect to this specific cluster.

You can use those and follow the same steps as the instructions above.