Publishing data from a CSV file
To publish a dataset on your portal, you must be granted at least the "Create new datasets" and "Publish own datasets" permissions.
This article shows how to create a dataset from data located on a CSV file.
What is a CSV file?
CSV is short for "comma-separated values." A CSV file therefore refers to tabular data that is saved as plain-text data separated by commas (or semi-commas).
CSV data can easily be converted to table format. For instance, the CSV data shown below:
Data1,Data2,Data3
Example1,Example2,Example3
Example1,Example2,Example3
You can also use similar file types:
Tab-Separated Values (TSV) files (.tsv)
Text files (.txt)
DAT files (.dat)
Importing your CSV data on Opendatasoft
Go to your portal's back-office. Select New dataset. Select From my computer.
Add your CSV file.
Your data has been successfully uploaded. 🎉
Configuring options
Take a look at the buttons located on the left side of the page, right beneath the file name (here "airports.csv").
1. File Encoding
Character encoding is the way characters are represented in a file. UTF-8 is the universal standard, but some files might be encoded in a legacy format (for example, old versions of Excel), which would require setting the encoding manually.
If your data appears in an unusual way on the preview page, you should check the encoding. Other than that, always keep the UTF8 encoding.
2. Row structure
The Field separator refers to the character used to separate fields.
If the different fields of your file are separated by commas, you don't have to change anything.
Otherwise, if any other separator is used to separate fields (semi-commas for instance), enter the separator in the text box. Correct values are usually ';'
, ','
, ' '
and '\t'
.
If an escape character is found right before a separator, the latter will no longer be considered a separator. The Escape character configuration option avoids this situation. By default, the text box is empty. If the file contains an escape character (for example, #
or \
), enter it in the text box.
Quoted fields refers to fields which values are enclosed in double quotes. By default, this option is toggled on. Toggle off the option if the field values are not enclosed in double quotes.
3. Data start point
First line number. If your data do not start at the first line, you can define which line should be considered to be the first one. The lines above will be skipped from the dataset. By default, the dataset starts at line 1. Enter the number of the line where the dataset starts.
Header. By default, this option is toggled on. It will take the values of the first line and turn it into your dataset name fields. Toggle off this option if the first line doesn't contain field names but data: the field labels will then be empty by default, and you will have to enter them directly in the Processing tab.
By toggling on the Extract filename option, you will add a new column to your dataset that will contain the name of your source file. This can be extremely useful if your dataset contains several sources, for instance, with sources containing dates.
Still need some help? Check out our learning resources at Opendatasoft Academy. For example, find out more about Publishing data from a static source file.