JSON files

JSON is an open-standard format that uses human-readable text to transmit data objects consisting of key-value pairs. It is the most common data format to build web APIs.

As JSON documents can have a lot of different forms, the platform can extract data from JSON files, JSON Lines files, and JSON dictionaries.

If the platform does not fully extract a document with a complex structure, use one of the JSON processors to complete the extraction.

JSON File

You can use a JSON file as a source. From this file, the platform extracts a valid JSON document (array or object) into one dataset of several records:

If the document is a JSON array, a record will be created for each object inside the array (the keys will be used as column names).
If the document is a JSON object, the "JSON root" parameter should contain a dot-separated path to the array inside your object. If not provided, the platform tries items.

For each item inside the array, the platform can follow another path before extracting the records with the parameter "JSON object".

Supported field types

Regular fields (decimal, boolean, string)
JSON object: used as-is
Array:
- If the array contains JSON objects, it is used as-is.
- If the array contains strings, a multivalued field is created with all the strings separated by a semicolon (;).

Creation

For more information about adding a file source, see Retrieving a file.

Configuration

Name

Description

Usage

JSON root

ijson path to the JSON array that contains the objects to be extracted as dataset records

ijson is a syntax to navigate inside JSON objects. It consists of separating the attribute names with dots (for example, ). More features are available for advanced users.

If the JSON array is at the root of the document, leave the JSON root box empty. If the JSON file is a JSON object, enter the ijson path to the array in the JSON root box.

JSON object

Relative path to the JSON object to extract

After setting the JSON root, if the preview shows irrelevant columns, and that the JSON object has been extracted as the value of one column, enter the name of that column in the JSON object box.

Name	Description	Usage
Extract filename	Creates a new column with the name of the source file.	By default, this option is toggled off. Toggle on this option to extract the file name in an additional column.

Examples

Example 1:

[
 {
     "name": "Agra Express",
     "origin": "Agra Cantt",
     "destination": "New Delhi"
 },
 {
     "name": "Gour Express",
     "origin": "Balurghat",
     "destination": "Sealdah"
 }
]

In this JSON file, the JSON array is directly at the root. You can leave the JSON root box empty.

The resulting dataset will be:

name	origin	destination
Agra Express	Agra Cantt	New Delhi
Gour Express	Balurghat	Sealdah

The column order may vary.

Example 2:

{
 "filename": "trains.json",
 "content": {
     "trains": [
         {
             "id": 123,
             "info": {
                 "name": "Agra Express",
                 "origin": "Agra Cantt",
                 "destination": "New Delhi"
             }
         },
         {
             "id": 555,
             "info": {
                 "name": "Gour Express",
                 "origin": "Balurghat",
                 "destination": "Sealdah"
             }
         }
     ]
 }
}

For this complex JSON file, the correct JSON root is content.trains. If content.trains is set as the JSON root, the resulting dataset will be:

id	info
123	{"origin": "Agra Cantt", "destination": "New Delhi", "name": "Agra Express"}
555	{"origin": "Balurghat", "destination": "Sealdah", "name": "Gour Express"}

To only extract the info JSON objects, and skip the id number, the correct JSON object is info. If info is set as a JSON object, the resulting dataset will be:

name	origin	destination
Agra Express	Agra Cantt	New Delhi
Gour Express	Balurghat	Sealdah

JSON Lines

You can use a JSON Lines file as a source. In JSON Lines files, each line is a one-line JSON document. The whole file is not a valid JSON document but each line is.

Two modes are supported:

One JSON array on each line (or one array altogether)
One JSON object on each line

Supported field types

Regular fields (decimal, string)
Array: used as-is
JSON object: used as-is

Creation

For more information about adding a file source, see Retrieving a file.

Configuration

Name	Description	Usage
Extract filename	Creates a new column with the name of the source file.	By default, this option is toggled off. Toggle on this option to extract the file name in an additional column.
First line number	For files that do not start at the first line, it is possible to define which line is considered the first one. The lines above will be skipped from the dataset.	By default, the dataset starts at line 1. Enter the number of the line that should be considered the beginning of the dataset.
Headers	For files whose first line contains field names	By default, this option is toggled on. It makes the values of the first line field labels. Toggle off this option if the first line doesn't contain field names but data: the field labels will then be empty by default.

JSON Dict

You can use a JSON dictionary as a source. The platform extracts a valid JSON document (array or object) into one dataset of several records.

The platform expects a JSON object where each key contains a record. The platform then creates records with:

one column to hold the key
one column for each attribute of the corresponding object (or a single column named "value" if the value is a string instead of an object).

Supported field types

Regular fields (decimal, bool, string)
JSON object: will be used as-is
Array:
- If the array contains JSON objects, it is used as-is.
- If the array contains strings, a multivalued field is created with all the strings separated by a semicolon (;).

Creation

For more information about adding a file source, see Retrieving a file.

Configuration

Name	Description	Usage
Extract filename	Creates a new column with the name of the source file.	By default, this option is toggled off. Toggle on this option to extract the file name in an additional column.
JSON root	ijson path to the JSON array that contains the objects to be extracted as dataset records ijson is a syntax to navigate inside JSON objects. It consists of separating the attribute names with dots (for example, ). More features are available for advanced users.	If the JSON array is at the root of the document, leave the JSON root box empty. If the JSON file is a JSON object, enter the ijson path to the array in the JSON root box.
Key field name	Label of the "key" column	The default value is "key". Enter the new label of the "key" column in the Key field name box. It will modify both the label and the technical identifier of the column.

Examples

Example 1:

{
  "2016": {
    "type1": {
      "price": 10,
      "color": "blue",
      "available": true
    },
    "type2": {
      "price": 9,
      "color": "red",
      "available": true
    }
  },
  "2015": {
    "type1": {
      "price": 10.5,
      "color": "teal",
      "available": true
    },
    "type2": {
      "price": 9.1,
      "color": "crimson",
      "available": true
    }
  }
}

If you leave the JSON root box empty, the resulting dataset will be:

key	type1	type2
2016	{"color": "blue", "available": true, "price": 10}	{"color": "red", "available": true, "price": 9}
2015	{"color": "teal", "available": true, "price": 10.5}	{"color": "crimson", "available": true, "price": 9.1}

Example 2:

{
  "results": {
    "datasets": [{
        "abc": {
          "title": "A B C",
          "description": "A description about ABC"
        }
      },
      {
        "xyz": {
          "title": "D E F",
          "description": "Another description"
        }
      }
    ]
  }
}

For this complex JSON file, the correct JSON root is results.datasets.item.

results.datasets leads to the array
item iterates inside the array and gets each object

key	description	title
abc	A description about ABC	A B C
xyz	Another description	D E F