What is a processor and how to use one
Processors are tools that can be used in order to modify, improve, or enrich the data of a dataset.
In the Opendatasoft platform, processors are classified into four different categories:
Processors for geographical mapping
Processors for dates handling
Processors for text transformations
Processors for generic operations
How to add a processor to a dataset
In your dataset's Processing tab, click on the Add a processor button.
Choose the processor to add to the dataset.
Using the documentation of the chosen processor, fill in the right parameters to set the processor.
(optional) Click on the edit icon to rename the processor. This step can especially be useful when a lot of processors are applied to a dataset, including multiple processors of the same type (for example, renaming the multiple Expression processors applied to a dataset to know more easily which one contains which expression).
You may need to click outside the processor box once the parameters are configured to make sure the processor and the changes it triggers are taken into account and applied to the dataset. Always use the technical identifiers of the fields when running a processor, never the labels.
Note that when you add a field to a dataset using a processor, that field is visible in the dataset's schema, which can be managed on the Schema tab.
Geographical processors
Geographical processors are divided into four categories, according to what is tried to being achieved:
Geocoders, used to convert a human-readable address into a geopoint.
GeoJoin processor, used to retrieve geoshapes from normalized codes for country-specific administrative divisions. The GeoJoin processor supports several countries, each of which features several indexing codes like postcode, state or region identifier, etc.
Retrieve Administrative Divisions processor, used to retrieve the name, code, and geoshape of country-specific administrative divisions enclosing a geopoint.
Converters & Functions, used to simplify, convert or normalize geographical data, or run computations based on them.
Geocoders
Name | Description | Availability |
Geocode full-text addresses by using the ArcGIS geocoding API | Default | |
Geocode addresses in France by using the Base d'Adresses Nationale (BAN) service | Default | |
Geocode addresses in the Netherlands by using the PDOK service | On demand | |
Geocode addresses in the USA by using the Census Bureau | On demand | |
Convert a three-word address into geographical coordinates | On demand | |
Geocode an IP address | Default | |
Geocode full-text addresses using OpenStreetMap data | On demand | |
Produce a three-word address with geographical coordinates | On demand |
The GeoJoin processor
Name | Description | Availability |
Retrieve administrative division geoshapes for a specified country and referential | Default |
The Retrieve administrative divisions processor
Name | Description | Availability |
Retrieve administrative divisions information with a geopoint | Default |
Converters and functions
Name | Description | Availability |
Compute the distance between two coordinates | Default | |
Fix invalid geoshapes | On demand | |
Convert degrees, minutes, seconds geographical coordinates to WGS84 coordinates | Default | |
Create a geopoint field from a latitude field and a longitude field | Default | |
Transform an encoded Google polyline into a GeoJSON LineString | On demand | |
Convert GeoHash values to GeoJSON | On demand | |
Provides privacy protection by approximating a geographical location within a specific radius | Default | |
Replace a geopoint with its WGS84 representation | Default | |
Remove points that are not in a polygon | On demand | |
Simplify a geoshape to reduce processing time and dataset size | Default | |
Convert vector geometry object represented in WKT or WKB into a GeoJson object | On demand |
Date processors
Name | Description | Availability |
Normalize a date format not automatically understood by the platform | Default | |
Define a timezone for a datetime field | Default |
Text processors
Name | Description | Availability |
Concatenate two fields | Default | |
Decode HTML entities from a text, to transform them into valid HTML | Default | |
Extract HTML from an HTML tag to only keep textual content | Default | |
Extract part of a field value using a regular expression | Default | |
Extract URLs from HTML or text contents | Default | |
Normalize Unicode content using the Normalization Form Canonical Composition (NFC) | Default | |
Normalize a field value to obtain a valid URL | Default | |
Replace a textual field value with a chosen text | Default | |
Replace a remove part of a field value using a regular expression | Default | |
Split a field value and extract part of it in a new field | Default |
Generic processors
Name | Description | Availability |
Add a new empty field in a dataset | Default | |
Copy a field value from a field to another | Default | |
Remove duplicated values in a multivalued field | Default | |
Delete a record based on field values | Default | |
Transpose rows containing a JSON array into several rows | Default | |
Transform the values contained in a multivalued field into several records | Default | |
Write complex expression patterns using field values | Default | |
Extract an arbitrary bit range from an hexadecimal or binary content | On demand | |
Extract values from a field containing a JSON object | Default | |
Retrieve images from URLs | Default | |
Join two datasets together to retrieve a specified field in a dataset | Default | |
Extract multiple values from a JSON array and concatenates them into a multivalued field | Default | |
Apply an expression on multiple fields | On demand | |
Skip records from a dataset | Default | |
Transform true values from boolean fields into a multivalued field | Default | |
Transform labels into field values | Default |