TDB xloader (“x” for external) is a bulkloader for very large datasets. The goal is stability and reliability for long running loading, running on modest hardware and can be use to load a database on rotating disk or SSD.
xloader is not a replacement for regular TDB1 and TDB2 loaders. It is for very
There are two scripts to load data using the xloader subsystem.
“tdb1.xloader”, which was called “tdbloader2”, has some improvements.
It is not as fast as other TDB loaders on datasets where the general loaders work without encountering progressive slowdown.
The xloaders for TDB1 and TDB2 are not identical. The TDB2 xloader is more capable; it is based on the same design approach with further refinements to building the node table and to reduce the total amount of temporary file space used.
The xloader does not run on MS Windows. It uses an external sort program from
The xloader only builds a fresh database from empty. It can not be used to load an existing database.
tdb2.xloader --loc DIRECTORY FILE…
tdb1.xloader --loc DIRECTORY FILE…
Additionally, there is an argument
--tmpdir to use a different directory for
FILE is any RDF syntax supported by Jena. Syntax is determined by the file
extension and can include an addtional “.gz” or “.bz2” for compressed files.
tdb2.xloader also supports argument
--threads to set the number of threads
to use with
sort(1). The default is 2. The recommendation for an initial
setting is to set it to the number of cores (not hardware threads) minus 1. This
is sensitive to the hardware environment. Experimentation may show a different,
To avoid a load failing due to a syntax or other data error, it is advisable to
riot --check on the data first. Parsing is faster than loading.
The TDB databases will take up a lot of disk space and in addition during
xloader uses a significant amount of temporary disk space.
If desired, the data can be converted to RDF Thrift at
this stage by adding
--stream rdf-thrift to the riot checking run. Parsing
RDF Thrift is faster than parsing N-Triples although the bulk of the loading
process is not limited by parser speed.
Do not capture the bulk loader output in a file on the same disk as the database or temporary directory; it slows loading down.