RDF files

Edited

You can use a RDF (.rdf) file as a source.

This platform creates records from an arbitrary RDF structure by converting all elements at a specific depth (optionally filtered by tag) to a set of records. For each element converted to a record, attributes, enclosed tags, and content are converted to fields. Complex data inside fields is converted to a JSON representation containing both attributes and content.

Creation

For more information about adding a file source, see Retrieving a file.

Configuration

Name

Description

Usage

Parent tags

Number of parents to get attributes from

If the enclosing tags contain relevant attributes, use this option to add them to the records.

Enter the number of parent tags to get attributes from (for example, 2).

Name of the tags to be extracted

If irrelevant tags are at the same depth as the extracted elements, use this option to only filter relevant tags.

Enter the tag to extract (for example, item) in the Name of the tags to be extracted box. If you leave the box empty, all tags at the specified depth will be converted into records.

Tag depth

Depth of the tags that must be converted to records

Enter the depth of the repeated tag in the Tag depth box (for example, 3).

Name

Description

Usage

Extract filename

Creates a new column with the name of the source file.

By default, this option is toggled off. Toggle on this option to extract the file name in an additional column.

Technical specifications

Field creation

The policy for creating fields from an item is defined as shown in the following examples.

1
2
3
4
5
6
7
8
<rows>
  <data attribute="attribute value" other_attribute="other attribute value">
    <indicator>GDP per capita</indicator>
    <country>Andean Region</country>
    <decimal>0</decimal>
  </data>
  <data attribute="2nd data tag">Text only</data>
</rows>

attribute

other_attribute

indicator

country

decimal

content

attribute value

other attribute value

GDP per capita

Andean Region

0

 

2nd data tag

 

 

 

 

Text only

JSON representation

Complex data inside fields is converted to JSON as shown in the following example.

1
2
3
4
5
6
7
8
9
<mydocument has="an attribute">
  <and>
    <many>elements</many>
    <many>more elements</many>
  </and>
  <plus a="complex">
    element as well
  </plus>
</mydocument>
{
    "mydocument": {
        "@has": "an attribute",
        "and": {
            "many": [
                "elements",
                "more elements"
            ]
        },
        "plus": {
            "@a": "complex",
            "#text": "element as well"
        }
    }
}

Examples

Example 1

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<?xml version="1.0" encoding="utf-8"?>
<wb:rows xmlns:wb="http://www.worldbank.org">
  <wb:data>
    <wb:indicator id="6.0.GDPpc">GDP per capita (2005 USD)</wb:indicator>
    <wb:country id="L5">Andean Region</wb:country>
    <wb:date>2012</wb:date>
    <wb:value>10561.668936515</wb:value>
    <wb:decimal>0</wb:decimal>
  </wb:data>
  <wb:data>
    <wb:indicator id="6.0.GDPpc">GDP per capita (2005 USD)</wb:indicator>
    <wb:country id="L5">Andean Region</wb:country>
    <wb:date>2011</wb:date>
    <wb:value>10215.3319157514</wb:value>
    <wb:decimal>0</wb:decimal>
  </wb:data>
  <wb:data>
    <wb:indicator id="6.0.GDPpc">GDP per capita (2005 USD)</wb:indicator>
    <wb:country id="L5">Andean Region</wb:country>
    <wb:date>2010</wb:date>
    <wb:value>9711.85739310366</wb:value>
    <wb:decimal>0</wb:decimal>
  </wb:data>
</wb:rows>

In this example:

  • Tag depth is set to 2 because wb:data is at the second level of the XML tree (wb:rows/wb:data).

  • You do not need to filter tags out because all elements at this depth are records.

The resulting dataset looks like this:

wb:indicator

wb:country

wb:date

wb:value

wb:decimal

{"#text": "GDP per capita (2005 USD)", "@id": "6.0.GDPpc"}

{"#text": "Andean Region", "@id": "L5"}

2005

8154.72913271721

0

{"#text": "GDP per capita (2005 USD)", "@id": "6.0.GDPpc"}

{"#text": "Bolivia", "@id": "BO"}

2009

5152.46337890625

0

{"#text": "GDP per capita (2005 USD)", "@id": "6.0.GDPpc"}

{"#text": "Bolivia", "@id": "BO"}

2006

4715.9892578125

0

Example 2

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
<?xml version="1.0" encoding="UTF-8"?>
<shoppingList>
    <id>abcdef123</id>
    <description>A shopping list</description>
    <basket>
        <itemCount>3</itemCount>
        <totalQuantity>19</totalQuantity>
        <item>
            <name>tomato</name>
            <quantity>10</quantity>
        </item>
        <item>
            <name>potato</name>
            <quantity>5</quantity>
        </item>
        <item>
            <name>banana</name>
            <quantity>4</quantity>
        </item>
    </basket>
</shoppingList>

In this example, the XML tree is complex. As a result, the automatic parameters detection cannot guess the proper depth. You must configure the source manually:

  • Tag depth must be set to 3 because the item node is at the third level of the XML tree (shoppingList/basket/item).

  • Name of the tags to be extracted must be set to item because itemCount and totalQuantity are also at the third level but not relevant.

The resulting dataset looks like this:

name

quantity

potato

5

banana

4

tomato

10