Defining a dataset schema

Anthony Pépin Updated by Anthony Pépin

Each dataset follows a schema. Fully defining and configuring a dataset schema means:

  1. Changing a dataset field label
  2. Adding a description to a dataset field
  3. Choosing a type for each field
  4. Setting up relevant fields as facets to define filters for the dataset
  5. Configuring options for each field, depending on their type and whether they are set up as facets or not
  6. Ordering the fields in the dataset
  7. Discarding fields from the dataset

1) Changing a label

The Opendatasoft platform retrieves the default field labels found in the source dataset. However, it is possible to change each dataset field label.

We highly recommend to choose well-written, explicit labels. Also, keep in mind that, since these labels will be displayed in the front office for all portal users, it might be preferable to choose simple labels instead of business-specific terms to make sure that a wider audience can understand the data.

To change a label:

  1. In the preview area of the Processing tab, select the current label of the field of your choice.
  2. Type a new label. It can contain special characters.
  3. Click outside the label area or press Enter for the changes to be taken into account.
Changing the label of a field does not modify in any way the technical identifier of that field, which can be found in that column's Configuration menu by clicking on the small gears icon.
Remember that in general you should avoid changing your technical identifiers. See the note below under "Name" for more information.

2) Adding a description to a field

Descriptions can be added to dataset fields for more context or precision.

To add a description to a field:

  1. In the preview area of the Processing tab, click in the Description text box of the field of your choice.
  2. Type the related description.
  3. Click outside the Description text box for the description to be taken into account.

Field descriptions will then be displayed in the front office:

  • In the "Data schema" section of the Information tab of the dataset
  • In the tooltip of their related fields, in the Table tab

3) Choosing a field type

Fields are characterized by types. Depending on the chosen field type, the platform will process and display records in a specific way.

To choose a type for a field:

  1. In the preview area of the Processing tab, click on the field type list box of the field of your choice.
  2. Choose the right type for this field.

There are eight different types: date, datetime, decimal, integer, geopoint, geoshape, text, and file.

Type

Description

Text

Field values are textual data.

Integer

Field values are integer numbers. Note that if a value contains a decimal, only the whole number is retained (the decimal value is removed). For example, if the value is 1.9, the resulting integer is 1.

Decimal

Field values are decimal numbers. Valid separators for the decimal part are . or ,.

DateTime

Field values are a combination of a date and a time. The ideal format is the ISO 8601 format, which is YYYY-mm-ddTHH:MM:ss+00:00YYYY-mm-ddTHH:MM:ssZ or YYYYmmddTHHMMssZ. Other formats are also understood by the platform, such as YYYY-mm-dd-HH:MM:ss or or YYYY-mm-dd HH:MM:ss.

The platform will try to guess as accurately as possible the input datetime format. However, in case of bad detection or ambiguity, use the Normalize Date processor to define the parsing format of the datetime field.
By default, time records are in the UTC timezone. To change the timezone, use the Set Timezone processor.

Date

Field values are dates. The ideal format is the ISO 8601 format, which is YYYY-mm-dd. Other formats are also understood by the platform, such as YYYY/mm/dd or dd/mm/YYYY.

The platform will try to guess as accurately as possible the input date format. However, in case of bad detection or ambiguity, use the Normalize Date processor to define the parsing format of the date field.

Geopoint

Field values are a single geographical location expressed in the format <LAT>,<LON>, for example, 45.8,2.5.

If your dataset contains two fields, latitude and longitude, use the Create GeoPoint processor to create a valid geo point field.

Geoshape

Field values are geographical shapes expressed in GeoJSON. For example :

{"type": "LineString",
"coordinates": [ [100.0, 0.0], [101.0, 1.0] ]}
Feature collections are not supported.

IP address

Field values are IP addresses in the usual IPv4 format, such as 192.0.2.22 (four one- to three-digit numbers separated by periods: 0.0.0.0 to 255.255.255.255).

The option is available to "Anonymize IP address," which converts the last number in the address to a 0. Our example address above would therefore become 192.0.2.0.

Note that this does not technically anonymize the address, but only renders it less specific. You remain responsible for the availability of personally identifiable data you publish.

Note that while you could feasibly use a text field to store IP addresses, doing so would not allow you to "anonymize" the addresses in the way indicated above, and actions that process IP addresses via the API (for example, aggregation by IP address and distinct counts) are faster when the IP address type is used.

File

This field type is only available in cases where field values are files sourced with one of the available methods to create a dataset with media files (with the File processor, through an archive file, or with a specific extractor), creating a field which default type is file.

4) Setting up fields as facets

Facets define the filters of a dataset, which are displayed on the left of the dataset's visualization in the front office. These filters have several purposes: they allow the users to find specific, precise records into a dataset, but they also allow the creation of charts afterward (if a field's records are something other than numbers, they will be usable in the Chart Builder and in the Analyze view only if they are set up as facets).

Fields which type is either geo shape or geo point cannot be set up as facets.

To set up a field as a facet:

  1. In the preview area of the Processing tab, choose the field you want to set up as a facet. Choose a field relevant enough to become a filter.
  2. Click the funnel button.

5) Configuring field options

  1. In the preview area of the Processing tab, click the wheel button of the field of your choice.
  2. Configuration options change depending on the type of the field and on whether the field is set up as a facet or not. Follow the right instructions depending on the chosen configuration option.

Configuration options available for every field

Configuration option

Description

Name

Technical identifier of the field. In contrary to the label, the technical identifier does not have an esthetic purpose and thus cannot contain special characters, including spaces. Technical identifiers can be used, for example, when creating a custom tooltip with HTML.

Changing the technical identifier of a field could break reuses of the related dataset (custom tooltip, custom tab, or pages). It can break processors that make reference to that identifier. It could also be a problem if the source of the dataset is (regularly) updated: when replacing a source with a newer one, the platform checks the technical identifier of the fields of both sources in order to find a match between the two -then replacing the old data with the new ones. If technical identifiers are not the same anymore, the dataset cannot be updated.

To change the technical identifier of a field:

  1. Select the current technical identifier of the field, written in the text box under "Name".
  2. Type a new technical identifier. It mustn't contain special characters.
  3. Click outside the text box or press Enter for the change to be taken into account.

Unique ID

Each record is uniquely identified by its identifier, which is by default computed as the fingerprint of all the record's field values. If the Unique ID option is activated for a field, records with the same identifier (or value) are deleted for only the last/oldest one to stay in the dataset. It is most useful for real-time datasets to make sure that instead of adding new records every time the dataset is updated, new values replace the old ones.

To activate the Unique ID option of a field: Tick the "Unique ID" box.

Configuration options for date & datetime fields

Configuration option

Description

Precision

Degree of precision of the date or datetime field.

To define a degree of precision:

  1. Click on the Precision list box of the field of your choice.
  2. Choose a degree of precision:

For date fields:

  • Year: Only the year of the date is displayed in the dataset
  • Month: Only the month and year of the date are displayed in the dataset
  • Day: The full date (day, month, and year) is displayed in the dataset

For datetime fields:

  • Hour
  • Minute

As for the datetime precisions, the full datetime (hour and minutes) is displayed in the dataset. The difference is in the Analyze view and in the Chart Builder where the degree of precision is available to configure the chart.

Configuration options for decimals & integer fields

Configuration option

Description

Unit

Unit of the field, to display along with the decimal or integer value.

To define a unit for a field:

  1. Click on the Unit list box of the field of your choice.
  2. Choose the right unit from the list.

Enforce the number of decimals to display

The chosen number of decimals for all values of the field.

To define a number of decimals for a field:

  1. Tick the "Enforce the number of decimals to display" box.
  2. In the text area below, type the number of decimals of your choice, or use the arrows to do so.

Configuration options for text fields

Configuration option

Description

Sortable

By default, only numerical fields (decimals and integers) are sortable. This option activates sorting on textual fields. In the Table view of the dataset in the front office, it is then possible to sort text fields alphabetically.

To make a field sortable: Tick the "Sortable" box.

Multivalued

This option is for multivalued records separated by one same separator. Example: France,UK,USA When set up as a facet, each of the field's records values appears as a separate entry in the filters section. When clicking on one of the entry, all the other entries which are not related (meaning the entries which never appear in the same record as part of a multivalues combination) automatically disappear -only the related entries remain as available filter entries.

To activate the multivalued option:

  1. Tick the "Multivalued" box.
  2. In the text box below, type the separator between the values of the records.
  3. Click outside the text box or press Enter for the change to be taken into account.

Hierarchical

This option is for multivalued records, separated by one same separator and which have a hierarchical relation. Example: France/Ile-de-France/Paris When set up as a facet, each first value of each record's multivalues combination appears as a separate entry in the filters section. When clicking on one entry, all second-level values related to that entry appear, and so on. Example: After clicking on the filter entry France, the related second-level entry Ile-de-France appears. After clicking on Ile-de-France, the related third-level entry Paris appears.

To activate the hierarchical option:

  1. Tick the "Hierarchical" box.
  2. In the text box below, type the separator between the values of the records.
  3. Click outside the text box or press Enter for the change to be taken into account.

Display only for specific languages

This option is only available for domains that support more than one language. It allows to choose whether a field should be displayed in only specific languages. A common usage would be when your dataset has separate columns for the same information—job titles, for example—in different languages. This way, you can ensure that only the relevant version of that column's data appears in a given language.

The configuration of this option affects neither the data itself nor the API. It only affects visualizations in the front office of the platform.

To make a field displayed for specific languages only:

  1. Tick the "Display only for specific language" box.
  2. From the list of languages available for the domain, tick those for which the field should be displayed.

Configuration options for facets

Configuration option

Description

Sort facet by

Chosen sorting of the field's entries in the filters section of the front office.

To chose a sorting order for a facetted field:

  1. Click on the Sort facet by list box of the field of your choice.
  2. Choose a sorting order:
  • Item counting (descending): entries are ordered from the one with the fewer records to the one with the most
  • Item counting (ascending): entries are ordered from the one with the most records to the one with the fewer
  • Name (descending): entries are ordered in an alphabetical order
  • Name (ascending): entries are ordered in a reverse alphabetical order

Allow multiple selection in filters

By default, after clicking on a filter entry, the others disappear. With the multiple selection option, it is possible to select several filter entries.

To allow multiple selection in filters: Tick the "Allow multiple selection in filters" box.

6) Ordering fields in datasets

Dataset fields can be reordered in their dataset. It can have two kinds of impact:

  • In the filters section of the dataset, in the front office. Changing the order of the fields in the dataset also changes the order of the filters. The first facetted field of the dataset becomes the first displayed filter, and so on.
  • In case the dataset contains geo shape fields. The Map view cannot display more than one layer of geo shapes. The geo shapes layer displayed by default must thus be defined. To do so, the field containing the geo shapes to be displayed by default on the map must the ordered before all other geo shapes fields.

To reorder a field in a dataset:

  1. In the preview area of the Processing tab, click on the reorder button of the field you want to reorder in the dataset.
  2. While maintaining the click on the Reorder button, drag the field to its new position in the dataset.
  3. Once the field in its new position in the dataset, stop maintaining the click.

7) Discarding fields from datasets

Dataset fields can be discarded from the dataset. It does not mean that the field is completely removed from the dataset but only deleted from the output. This is why, once the dataset is published, the discarded field will not be displayed in any visualization, and if the dataset is exported, the discarded field will not be in the export.

To discard a field from a dataset: Click on the trash button of the field you want to discard from the dataset.

Since discarded fields are not completely removed from the dataset, they can be restored at any time.

To restore a discarded field from a dataset:

  1. In the preview area of the Processing tab, swipe to the right to go to the last fields of the datasets.
  2. The discarded fields of the dataset appear at the very end of the dataset. They look like blank, grey columns named by their technical identifiers. Find those you want to restore.
  3. Click on the reorder button.

How did we do?

Dataset limits

How Opendatasoft manages dates

Table of Contents

Contact

Powered by HelpDocs (opens in a new tab)