Apache Jena - SDB Query performance

The Apache Jena SDB module has been retired and is no longer supported.
The last release of Jena with this module was Apache Jena 3.17.0.

This page compares the effect of SDB with RDB, Jena’s usual database layout. RDB was designed for supporting the fine-grained API calls as well as having some support for basic graph patterns. Therefore, the RDB design goals were not those of SDB.

RDB uses a denormalised database layout in order that all statement-level operations do not require additional joins. The SDB layout is normalised so that the triple table is narrower and uses integers for RDF nodes, then does do joins to get the node representation. These optimizers for longer patterns, not API operations.

These figures were taken July 2007.

As with any performance figures, these should be taken merely as a guide. The shape of the data, the hardware details, choice of database, and its configuration (particularly amount of memory used), as well as the queries themselves all greatly contribute to the execution costs.

Setup

Database and hardware setup was the same as for the load performance tests.

Data was taken generated with the LUBM test generator (with N = 15), then the inference expanded on loading to give about 19.5 million triples. This data is larger than the database could completely cache.

The queries are taken the LUBM suite and rewritten in SPARQL.

LUBM Query 1

 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
 SELECT * WHERE
 {
     ?x rdf:type ub:GraduateStudent .
     ?x ub:takesCourse <http://www.Department0.University0.edu/GraduateCourse0> .
 }

Jena: 24.16s
SDB/index: 0.014s
SDB/hash: 0.04s

LUBM Query 2

 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 PREFIX ub: <http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#>
 SELECT * WHERE
 {
      ?x rdf:type ub:GraduateStudent .
      ?y rdf:type ub:University .
      ?z rdf:type ub:Department .
      ?x ub:memberOf ?z .
      ?z ub:subOrganizationOf ?y .
      ?x ub:undergraduateDegreeFrom ?y .
 }

This query searches for a particular pattern in the data without specific starting point.

Jena: 232.1s (153s with an addition index on OP)
SDB/index: 12.7s
SDB/hash: 3.7s

Notes: Removing the rdf:type statements actually slows the query down.

Summary

In SPARQL queries, there is often a sufficiently complex graph pattern that the SDb design tradeoff provides significant advantages in query performance.