What is a processor and how to use one

Edited

Processors are tools that can be used in order to modify, improve, or enrich the data of a dataset.

In the Opendatasoft platform, processors are classified into four different categories:

  • Processors for geographical mapping

  • Processors for dates handling

  • Processors for text transformations

  • Processors for generic operations

How to add a processor to a dataset

  1. In your dataset's Processing tab, click on the Add a processor button.

  2. Choose the processor to add to the dataset.

  3. Using the documentation of the chosen processor, fill in the right parameters to set the processor.

  4. (optional) Click on the edit icon to rename the processor. This step can especially be useful when a lot of processors are applied to a dataset, including multiple processors of the same type (for example, renaming the multiple Expression processors applied to a dataset to know more easily which one contains which expression).

You may need to click outside the processor box once the parameters are configured to make sure the processor and the changes it triggers are taken into account and applied to the dataset. Always use the technical identifiers of the fields when running a processor, never the labels.

Note that when you add a field to a dataset using a processor, that field is visible in the dataset's schema, which can be managed on the Schema tab.

Geographical processors

Geographical processors are divided into four categories, according to what is tried to being achieved:

  • Geocoders, used to convert a human-readable address into a geopoint.

  • GeoJoin processor, used to retrieve geoshapes from normalized codes for country-specific administrative divisions. The GeoJoin processor supports several countries, each of which features several indexing codes like postcode, state or region identifier, etc.

  • Retrieve Administrative Divisions processor, used to retrieve the name, code, and geoshape of country-specific administrative divisions enclosing a geopoint.

  • Converters & Functions, used to simplify, convert or normalize geographical data, or run computations based on them.

Geocoders

Name

Description

Availability

Geocode with ArcGIS

Geocode full-text addresses by using the ArcGIS geocoding API

Default

Geocode with BAN (France)

Geocode addresses in France by using the Base d'Adresses Nationale (BAN) service

Default

Geocode with PDOK

Geocode addresses in the Netherlands by using the PDOK service

On demand

Geocode with the Census Bureau (USA)

Geocode addresses in the USA by using the Census Bureau

On demand

Get coordinates from a three-word address

Convert a three-word address into geographical coordinates

On demand

IP address to geo Coordinates

Geocode an IP address

Default

Nominatim geocoder

Geocode full-text addresses using OpenStreetMap data

On demand

what3words

Produce a three-word address with geographical coordinates

On demand

The GeoJoin processor

Name

Description

Availability

GeoJoin

Retrieve administrative division geoshapes for a specified country and referential

Default

The Retrieve administrative divisions processor

Name

Description

Availability

Retrieve administrative divisions

Retrieve administrative divisions information with a geopoint

Default

Converters and functions

Name

Description

Availability

Compute geo distance

Compute the distance between two coordinates

Default

Correct geo shape

Fix invalid geoshapes

On demand

Convert Degrees

Convert degrees, minutes, seconds geographical coordinates to WGS84 coordinates

Default

Create geo point

Create a geopoint field from a latitude field and a longitude field

Default

Decode a Google polyline

Transform an encoded Google polyline into a GeoJSON LineString

On demand

GeoHash to GeoJSON

Convert GeoHash values to GeoJSON

On demand

Geomasking

Provides privacy protection by approximating a geographical location within a specific radius

Default

Normalize projection reference

Replace a geopoint with its WGS84 representation

Default

Polygon filtering

Remove points that are not in a polygon

On demand

Simplify geo shape

Simplify a geoshape to reduce processing time and dataset size

Default

WKT and WKB to GeoJSON

Convert vector geometry object represented in WKT or WKB into a GeoJson object

On demand

Date processors

Name

Description

Availability

Normalize date

Normalize a date format not automatically understood by the platform

Default

Set timezone

Define a timezone for a datetime field

Default

Text processors

Name

Description

Availability

Concatenate text

Concatenate two fields

Default

Decode HTML entities

Decode HTML entities from a text, to transform them into valid HTML

Default

Extract HTML

Extract HTML from an HTML tag to only keep textual content

Default

Extract text

Extract part of a field value using a regular expression

Default

Extract URLs

Extract URLs from HTML or text contents

Default

Normalize Unicode values

Normalize Unicode content using the Normalization Form Canonical Composition (NFC)

Default

Normalize URL

Normalize a field value to obtain a valid URL

Default

Replace text

Replace a textual field value with a chosen text

Default

Replace via regular expression

Replace a remove part of a field value using a regular expression

Default

Split text

Split a field value and extract part of it in a new field

Default

Generic processors

Name

Description

Availability

Add a field

Add a new empty field in a dataset

Default

Copy a field

Copy a field value from a field to another

Default

Deduplicate multivalued fields

Remove duplicated values in a multivalued field

Default

Delete record

Delete a record based on field values

Default

Expand JSON array

Transpose rows containing a JSON array into several rows

Default

Expand multivalued field

Transform the values contained in a multivalued field into several records

Default

Expression

Write complex expression patterns using field values

Default

Extract bit range

Extract an arbitrary bit range from an hexadecimal or binary content

On demand

Extract from JSON

Extract values from a field containing a JSON object

Default

File

Retrieve images from URLs

Default

Join datasets

Join two datasets together to retrieve a specified field in a dataset

Default

JSON array to multivalued

Extract multiple values from a JSON array and concatenates them into a multivalued field

Default

Meta expression

Apply an expression on multiple fields

On demand

Skip records

Skip records from a dataset

Default

Transform boolean columns to multivalued field

Transform true values from boolean fields into a multivalued field

Default

Transpose columns to rows

Transform labels into field values

Default