Performance reporting is an area prone to misinterpretation, and such reports should be liberally decorated with disclaimers. In our case there are an alarming number of variables: the hardware, the operating system, the database engine and its myriad parameters, the data itself, the queries, and planetary alignment.
Given this here is some basic information. You may find it sufficient:
We suggest that you don’t choose your database based on these figures. The performance is broadly similar, so if you already have a relational database installed this is your best option.
SDB supports a range of databases, but the figures here are limited to SQLServer and Postgresql. The hardware used was identical, although running linux (for Postgresql) and windows (for SQLServer).
We use the Lehigh University Benchmark http://swat.cse.lehigh.edu/projects/lubm/ and dbpedia http://dbpedia.org/, together with some example queries that each provides. You can find the queries in SDB/PerfTests.
LUBM generates artificial datasets. To be useful one needs to apply reasoning, and this was done in advance of loading. The queries are quite stressful for SDB in that they are not very ground (in many neither subjects nor objects are present), and many produce very large result sets. Thus they are probably atypical of many SPARQL queries.
The dbpedia queries are, unlike LUBM, quite ground. dbpedia contains many large literals, in contrast to LUBM.
All operations were performed using SDB’s command line tools. The data was loaded into a freshly formatted SDB store – although postgresql needs an ANALYSE to avoid silly planning – then the additional indexes were added.
|Benchmark||Database loading Speed (tps)||Index time (s)||Size (MB)|
|LUBM Postgres (Hash)||4972||199||5124|
|LUBM Postgres (Index)||8658||176||3666|
|LUBM SQLServer (Hash)||8762||121||3200|
|LUBM SQLServer (Index)||7419||68||2029|
|DBpedia Postgres (Hash)||3029||298||10193|
|DBpedia Postgres (Index)||4293||227||6251|
|DBpedia SQLServer (Hash)||5345||162||6349|
|DBpedia SQLServer (Index)||4749||110||4930|
To illustrate the variability in loading speed, and emphasise the importance of tuning, consider the case of Uniprot http://dev.isb-sib.ch/projects/uniprot-rdf/. Uniprot contains (at the time of writing) around 700 million triples. We loaded these on to the SQLServer setup given above, but with the following changes:
So the rdf data, database data, and log data were all on distinct disks.
Loading into an index-layout store proceeded at: