Publishing data in JSON format

JSON is an open-standard format that uses human-readable text to transmit data objects consisting of key-value pairs. It is the most common data format to build web APIs.

As JSON documents can have a lot of different forms, the platform can extract data from JSON files, JSON Lines files, and JSON dictionaries.

If the platform does not fully extract a document with a complex structure, use one of the JSON processors to complete the extraction.

JSON File

You can use a JSON file as a source. From this file, the platform extracts a valid JSON document (array or object) into one dataset of several records:

If the document is a JSON array, a record will be created for each object inside the array (the keys will be used as column names).
If the document is a JSON object, the "JSON root" parameter should contain a dot-separated path to the array inside your object. If not provided, the platform tries items.

For each item inside the array, the platform can follow another path before extracting the records with the parameter "JSON object".

Supported field types

Regular fields (decimal, boolean, string)
JSON object: used as-is
Array:
- If the array contains JSON objects, it is used as-is.
- If the array contains strings, a multivalued field is created with all the strings separated by a semicolon (;).

Configuration

JSON root

JSON root indicates the path to the JSON array that contains the objects to be extracted as the future records of your dataset.

If the JSON array that contains the future data is at the root of the document, leave the JSON root box empty. If the JSON file is a JSON object, enter the ijson path to the array in the JSON root box.

Note that ijson is a syntax to navigate inside JSON objects, consisting of separating attribute names with dots.

JSON object

In the JSON object box, indicate the relative path to the JSON object to extract.

Examples

1. In this first example, the JSON array is located at the root of the file. The JSON root box must therefore be left empty.

[
 {
     "name": "Agra Express",
     "origin": "Agra Cantt",
     "destination": "New Delhi"
 },
 {
     "name": "Gour Express",
     "origin": "Balurghat",
     "destination": "Sealdah"
 }
]

And the resulting dataset will have this schema:

name	origin	destination
Agra Express	Agra Cantt	New Delhi
Gour Express	Balurghat	Sealdah

2. In this second example, the JSON file is more complex. The correct JSON root to put is content.trains.

{
 "filename": "trains.json",
 "content": {
     "trains": [
         {
             "id": 123,
             "info": {
                 "name": "Agra Express",
                 "origin": "Agra Cantt",
                 "destination": "New Delhi"
             }
         },
         {
             "id": 555,
             "info": {
                 "name": "Gour Express",
                 "origin": "Balurghat",
                 "destination": "Sealdah"
             }
         }
     ]
 }
}

If content.trains is set as the JSON root, the resulting dataset will be:

id	info
123	{"origin": "Agra Cantt", "destination": "New Delhi", "name": "Agra Express"}
555	{"origin": "Balurghat", "destination": "Sealdah", "name": "Gour Express"}

To only extract the info JSON objects, and skip the id number, you should put the info object as JSON root. Only then, the resulting dataset will be:

name	origin	destination
Agra Express	Agra Cantt	New Delhi
Gour Express	Balurghat	Sealdah