Getting started
Exploring and using data
Exploring catalogs and datasets
Exploring a catalog of datasets
What's in a dataset
Filtering data within a dataset
An introduction to the Explore API
An introduction to the Automation API
Introduction to the WFS API
Downloading a dataset
Search your data with AI (vector search)
The Explore data with AI feature
Creating maps and charts
Creating advanced charts with the Charts tool
Overview of the Maps interface
Configure your map
Manage your maps
Reorder and group layers in a map
Creating multi-layer maps
Share your map
Navigating maps made with the Maps interface
Rename and save a map
Creating pages with the Code editor
How to limit who can see your visualizations
Archiving a page
Managing a page's security
Creating a page with the Code editor
Content pages: ideas, tips & resources
How to insert internal links on a page or create a table of contents
Sharing and embedding a content page
How to troubleshoot maps that are not loading correctly
Creating content with Studio
Creating content with Studio
Adding a page
Publishing a page
Editing the page layout
Configuring blocks
Previewing a page
Adding text
Adding a chart
Adding an image block to a Studio page
Adding a map block in Studio
Adding a choropleth map block in Studio
Adding a points of interest map block in Studio
Adding a key performance indicator (KPI)
Configuring page information
Using filters to enhance your pages
Refining data
Managing page access
How to edit the url of a Studio page
Embedding a Studio page in a CMS
Visualizations
Managing saved visualizations
Configuring the calendar visualization
The basics of dataset visualizations
Configuring the images visualization
Configuring the custom view
Configuring the table visualization
Configuring the map visualization
Understanding automatic clustering in maps
Configuring the analyze visualization
Publishing data
Publishing datasets
Creating a dataset
Creating a dataset from a local file
Creating a dataset with multiple files
Creating a dataset from a remote source (URL, API, FTP)
Creating a dataset using dedicated connectors
Creating a dataset with media files
Federating an Opendatasoft dataset
Publishing a dataset
Publishing data from a CSV file
Publishing data in JSON format
Supported file formats
Promote mobility data thanks to GTFS and other formats
What is updated when publishing a remote file?
Configuring datasets
Automated removal of records
Configuring dataset export
Checking dataset history
Configuring the tooltip
Dataset actions and statuses
Dataset limits
Defining a dataset schema
How Opendatasoft manages dates
How and where Opendatasoft handles timezones
How to find your workspace's IP address
Keeping data up to date
Processing data
Translating a dataset
How to configure an HTTP connection to the France Travail API
Deciding what license is best for your dataset
Types of source files
OpenStreetMap files
Shapefiles
JSON files
XML files
Spreadsheet files
RDF files
CSV files
MapInfo files
GeoJSON files
KML/KMZ files
GeoPackage
Connectors
Saving and sharing connections
Airtable connector
Amazon S3 connector
ArcGIS connector
Azure Blob storage connector
Database connectors
Dataset of datasets (workspace) connector
Eco Counter connector
Feed connector
Google BigQuery connector
Google Drive connector
How to find the Open Agenda API Key and the Open Agenda URL
JCDecaux connector
Netatmo connector
OpenAgenda connector
Realtime connector
Salesforce connector
SharePoint connector
U.S. Census connector
WFS connector
Databricks connector
Connecteur Waze
Harvesters
Harvesting a catalog
ArcGIS harvester
ArcGIS Hub Portals harvester
CKAN harvester
CSW harvester
FTP with meta CSV harvester
Opendatasoft Federation harvester
Quandl harvester
Socrata harvester
data.gouv.fr harvester
data.json harvester
Processors
What is a processor and how to use one
Add a field processor
Compute geo distance processor
Concatenate text processor
Convert degrees processor
Copy a field processor
Correct geo shape processor
Create geo point processor
Decode HTML entities processor
Decode a Google polyline processor
Deduplicate multivalued fields processor
Delete record processor
Expand JSON array processor
Expand multivalued field processor
Expression processor
Extract HTML processor
Extract URLs processor
Extract bit range processor
Extract from JSON processor
Extract text processor
File processor
GeoHash to GeoJSON processor
GeoJoin processor
Geocode with ArcGIS processor
Geocode with BAN processor (France)
Geocode with PDOK processor
Geocode with the Census Bureau processor (United States)
Geomasking processor
Get coordinates from a three-word address processor
IP address to geo Coordinates processor
JSON array to multivalued processor
Join datasets processor
Meta expression processor
Nominatim geocoder processor
Normalize Projection Reference processor
Normalize URL processor
Normalize Unicode values processor
Normalize date processor
Polygon filtering processor
Replace text processor
Replace via regular expression processor
Retrieve Administrative Divisions processor
Set timezone processor
Simplify Geo Shape processor
Skip records processor
Split text processor
Transform boolean columns to multivalued field processor
Transpose columns to rows processor
WKT and WKB to GeoJson processor
what3words processor
Data Collection Form
About the Data Collection Form feature
Data Collection Forms associated with your Opendatasoft workspace
Create and manage your data collection forms
Sharing and moderating your data collection forms
Dataset metadata
Analyzing how your data is used
Getting involved: Sharing, Reusing and Reacting
Discovering & submitting data reuses
Sharing through social networks
Commenting via Disqus
Submitting feedback
Following dataset updates
Sharing and embedding data visualizations
Monitoring usage
An overview of monitoring your workspaces
Analyzing user activity
Analyzing actions
Detail about specific fields in the ods-api-monitoring dataset
How to count a dataset's downloads over a specific period
Analyzing data usage
Analyzing a single dataset with its monitoring dashboard
Analyzing back office activity
Using the data lineage feature
Managing your users
Managing limits
Managing users
Managing users
Setting quotas for individual users
Managing access requests
Inviting users to the portal
Managing workspaces
Managing your portal
Configuring your portal
Configure catalog and dataset pages
Configuring a shared catalog
Sharing, reusing, communicating
Customizing your workspace's URL
Managing legal information
Connect Google Analytics (GA4)
Regional settings
Pictograms reference
Managing tracking
Best practices for search engine optimization (SEO)
Look & Feel
Branding your portal
Customizing portal themes
How to customize my portal according to the current language
Managing the dataset themes
Configuring data visualizations
Configuring the navigation
Adding IGN basemaps
Adding images and fonts
Plans and quotas
Managing security
Configuring your portal's overall security policies
A dataset's Security tab
Mapping your directory to groups in Opendatasoft (with SSO)
Single sign-on with OpenID Connect
Single sign-on with SAML
Parameters
- Home
- Publishing data
- Harvesters
- Harvesting a catalog
Harvesting a catalog
Updated by Patrick Smith
Harvesters provide a way for administrators to easily create and update an important number of datasets by importing them from an external source such as a CSW catalog or an ArcGIS service, among many others.
The two main usages of harvesters are:
- Bootstrap your portal with datasets from an existing portal
- Keep your datasets synchronized with an external service
The harvester will create datasets, update their metadata and resources, keep them synchronized, and publish them.
Creating a harvester
To get started with harvesters, click on the harvesters menu in your back office and then on Add harvester. You will be asked to choose the type of portal you want to harvest, and a name for your harvester.
When you are done, click on Create harvester. You will be redirected to the configuration form of the harvester. As it depends on the harvester type, please refer to each harvester page below for detailed instructions.
Some options are available for every harvester type, such as:
- Update on deletion: if the source datasets are deleted on the harvested portal, delete them on this Opendatasoft portal too. Otherwise, you may have datasets that are not available on the external service anymore (e.g: if they are deleted from the external service).
- Download resources: download resources instead of attaching them via URL. This option allows you to detach your datasets from the remote portal by permanently copying all required data on the Opendatasoft platform. Otherwise, your datasets will be linked to the external service and will access remote datasets via their URL for every publishing.
- Restrict visibility: make the visibility of harvested datasets restricted. Otherwise, they will have the default visibility of your portal.
- Default metadata, inspire metadata, DCAT metadata: allow you to override some metadata in every harvested dataset. Useful if you want to force the theme or publisher instead of using the one used on the external service.
Once you are done configuring the harvester, you can click on the Preview button to test run it on a few datasets. If you see some titles and descriptions and they look correct, you are all set. Otherwise, please double check your configuration.
Running a harvester
The harvesting process can be quite long on external services with many datasets or with big ones. That's why it's split into two phases:
- First, the harvester will connect to the remote service and discover all the datasets it contains. It will then create an unpublished dataset for each remote dataset it finds. These datasets will contain all available metadata and resources (as URLs or as files depending on the download resources option). This happens when you click on the Start harvester button.
- Next, it will process and publish all the harvested datasets. This step can take a while. This happens when you click on the Publish button.
Editing harvested datasets
Before publishing them, you can change the metadata of the harvested datasets. You can manually override metadata on the dataset page (information tab) by clicking on Override and adding your own value. This override will be kept even if you restart your harvester.
Deleting a harvester
When you delete a harvester by clicking the Delete harvester button, you can choose between keeping the harvested datasets (they will be kept as regular datasets in your catalog) or by deleting them with the harvester.
If you choose to keep them, please keep in mind that you will have to handle them one by one to unpublish or delete them afterward and that they will be duplicated if you recreate another harvester on the same external service.
Harvester types
Portals
Opendatasoft Federation harvester, data.gouv.fr harvester, ArcGIS harvester, ArcGIS Hub Portals harvester, CKAN harvester, Socrata harvester, data.json harvester, Quandl harvester
Services
CSW harvester, FTP with meta CSV harvester
The FTP harvester uses FTPS (explicit mode on port 21) by default but supports FTP if specified in the provided URL or if the remote server does not support FTPS.
Scheduling
From the configuration page of a harvester, it is possible to make it run periodically. To do this, scroll to the bottom of the page and click on Set recurring runs. You can run the harvester every day or choose the days of the week or the days of the month it will run on. However, you always have to choose the time of day when it will run because it can not run more than once a day.
The periodic run will only trigger if the harvester has been run at least once.
At the end of a scheduled run, all the harvester's already published datasets will be republished, but unpublished datasets or new datasets will not be automatically published.