XML files
You can use an XML (.xml) file as a source.
This platform creates records from an arbitrary XML structure by converting all elements at a specific depth (optionally filtered by tag) to a set of records. For each element converted to a record, attributes, enclosed tags, and content are converted to fields. Complex data inside fields is converted to a JSON representation containing both attributes and content.
Creation
For more information about adding a file source, see Retrieving a file.
Configuration
Name | Description | Usage |
Parent tags | Number of parents to get attributes from If the enclosing tags contain relevant attributes, use this option to add them to the records. | Enter the number of parent tags to get attributes from (for example, |
Name of the tags to be extracted | If irrelevant tags are at the same depth as the extracted elements, use this option to only filter relevant tags. | Enter the tag to extract (for example, |
Tag depth | Depth of the tags that must be converted to records | Enter the depth of the repeated tag in the Tag depth box (for example, |
Name | Description | Usage |
Extract filename | Creates a new column with the name of the source file. | By default, this option is toggled off. Toggle on this option to extract the file name in an additional column. |
Technical specifications
Field creation
The policy for creating fields from an item is defined as shown in the following examples.
|
|
attribute | other_attribute | indicator | country | decimal | content |
attribute value | other attribute value | GDP per capita | Andean Region | 0 |
|
2nd data tag |
|
|
|
| Text only |
JSON representation
Complex data inside fields is converted to JSON as shown in the following example.
|
|
{
"mydocument": {
"@has": "an attribute",
"and": {
"many": [
"elements",
"more elements"
]
},
"plus": {
"@a": "complex",
"#text": "element as well"
}
}
}
Examples
Example 1
|
|
In this example:
Tag depth is set to
2
becausewb:data
is at the second level of the XML tree (wb:rows/wb:data
).You do not need to filter tags out because all elements at this depth are records.
The resulting dataset looks like this:
wb:indicator | wb:country | wb:date | wb:value | wb:decimal |
{"#text": "GDP per capita (2005 USD)", "@id": "6.0.GDPpc"} | {"#text": "Andean Region", "@id": "L5"} | 2005 | 8154.72913271721 | 0 |
{"#text": "GDP per capita (2005 USD)", "@id": "6.0.GDPpc"} | {"#text": "Bolivia", "@id": "BO"} | 2009 | 5152.46337890625 | 0 |
{"#text": "GDP per capita (2005 USD)", "@id": "6.0.GDPpc"} | {"#text": "Bolivia", "@id": "BO"} | 2006 | 4715.9892578125 | 0 |
Example 2
|
|
In this example, the XML tree is complex. As a result, the automatic parameters detection cannot guess the proper depth. You must configure the source manually:
Tag depth must be set to
3
because theitem
node is at the third level of the XML tree (shoppingList/basket/item
).Name of the tags to be extracted must be set to
item
becauseitemCount
andtotalQuantity
are also at the third level but not relevant.
The resulting dataset looks like this:
name | quantity |
potato | 5 |
banana | 4 |
tomato | 10 |