The architecture of CSV PropertyTable mainly involves 2 components:
PropertyTable is collection of data that is sufficiently regular in shape it can be treated as a table.
That means each subject has a value for each one of the set of properties.
Irregularity in terms of missing values needs to be handled but not multiple values for the same property.
With special storage, a PropertyTable
PropertyTable is designed to be a table of RDF terms, or
Nodes in Jena.
Each Column of the
PropertyTable has an unique columnKey
Node of the predicate (or p for short).
Each Row of the
PropertyTable has an unique rowKey
Node of the subject (or s for short).
You can use
getColumn() to get the
Column by its columnKey
Node of the predicate, while
PropertyTable should be constructed in this workflow (in order):
Rowcreated, set a value (
Node) at the specified
Column, by calling
PropertyTable is built, tabular data within can be accessed by the API of
GraphPropertyTable implements the Graph interface (read-only) over a
This is subclass from GraphBase and implements
graphBaseFind()(for matching a
propertyTableBaseFind()(for matching a whole
Row) methods can choose the access route based on the find arguments.
GraphPropertyTable holds/wraps a reference of the
PropertyTable instance, so that such a
Graph can be treated in a more table-like fashion.
GraphPropertyTable are NOT restricted to CSV data.
They are supposed to be compatible with any table-like data sources, such as relational databases, Microsoft Excel, etc.
GraphCSV is a sub class of GraphPropertyTable aiming at CSV data.
Its constructor takes a CSV file path as the parameter, parse the file using a CSV Parser, and makes a
For CSV to RDF mapping, we establish some basic principles:
In the CSV-WG, it looks like duplicate column names are not going to be supported. Therefore, we just consider parsing single-valued CSV tables. There is the current editor working draft from the CSV on the Web Working Group, which is defining a more regular data out of CSV. This is the target for the CSV work of GraphCSV: tabular regular-shaped CSV; not arbitrary, irregularly shaped CSV.
A CSV file with no additional metadata is directly mapped to RDF, which makes a simpler case compared to SQL-to-RDF work. It’s not necessary to have a defined primary column, similar to the primary key of database. The subject of the triple can be generated through one of:
All the values in CSV are parsed as strings line by line. As a better option for the user to turn on, a dynamic choice which is a posh way of saying attempt to parse it as an integer (or decimal, double, date) and if it passes, it’s an integer (or decimal, double, date).
Note that for the current release, all of the numbers are parsed as
date is not supported yet.
RDF requires that the subjects and the predicates are URIs. We need to pass in the namespaces (or just the default namespaces) to make URIs by combining the namespaces with the values in CSV.
We don’t have metadata of the namespaces for the columns, But subjects can be blank nodes which is useful because each row is then a new blank node. For predicates, suppose the URL of the CSV file is
file:///c:/town.csv, then the columns can be
<file:///c:/town.csv#Population>, as is showed in the illustration.
The first line of the CSV file must be the table header. The columns of the first line are parsed as the predicates of the RDF triples. The RDF triple data are parsed starting from the second line.
The CSV files must be UTF-8 encoded. If your CSV files are using Western European encodings, please change the encoding before using CSV PropertyTable.