TDB canonicalizes certain XSD datatypes. The value of literals of
these datatypes is stored, not the original lexical form. For
example, "01"^^xsd:integer
, "1"^^xsd:integer
and
"+001"^^xsd:integer
are all the same value and are stored as the
same RDF literal. In addition,
derived types
for integers are also understood by TDB. For example,
"01"^^xsd:integer
and "1"^^xsd:byte
are the same value.
When RDF terms for these values are returned, the lexical form will be the canonical representation.
Only certain ranges of values are directly encoded as values. If a literal is outside the canonicalization range, its lexical representation is stored. TDB transparently switches between value and non-value based literals in graph matching and filter expressions; non-canonicalized and canonicalized values will be compared as needed.
(Future versions of TDB may increase the ranges canonicalized.)
The datatypes canonicalized by TDB are:
- XSD decimal (canonicalized range: 8 bits of scale, signed 48 bits of value)
- XSD integer (canonicalized range: 56 bits)
- XSD dateTime (canonicalized range: 0 to the year 8000, millisecond accuracy, timezone to 15 minutes).
- XSD date (canonicalized range: 0 to the year 8000, timezone to 15 minutes).
- XSD boolean (canonicalized range: true and false)