explaining ARQ queries

This page applies to ARQ version 2.8.6 and later. In this version query logging was consolidated and made uniform across ARQ, SDB and TDB. Details of TDB logging changed to use this logging and explanation framework from TDB version 0.8.8.

Optimization in ARQ proceeds on two levels. After the query is parsed, the SPARQL algebra for the query is generated as described in the SPARQL specification. High-level optimization occurs by rewriting the algebra into new, equivalent algebra forms and introducing specialized algebra operators. During query execution, the low-level, storage-specific optimization occurs such as choosing the order of triple patterns within basic graph patterns.

The effect of high-level optimizations can be seen using arq.qparse and the low-level runtime optimziations can be seen by execution logging.

Algebra Transformations

The preparation for a query for execution can be investigated with the command arq.qparse --explain --query QueryFile.rq. Different storage systems may perform different optimizations, usually chosen from the standard set. qparse shows the action of the memory-storage optimizer which applies all optimizations.

Other useful arguments are:

qparse arguments

Argument Effect
--print=query Print the parsed query
--print=op Print the SPARQL algebra for the query. This is exactly the algebra specified by the SPARQL standard.
--print=opt Print the optimized algebra for the query.
--print=quad Print the quad form algebra for the query.
--print=optquad Print the quad-form optimized algebra for the query.

The argument --explain is equivalent to --print=query --print=opt

Examples:

arq.qparse --explain --query Q.rq

arq.qparse --explain 'SELECT * { ?s ?p ?o }'

Execution Logging

ARQ can log query and update execution details globally or for an individual operations. This adds another level of control on top of the logger level controls.

From command line:

arq.sparql --explain --data ... --query ...

Explanatory messages are controlled by the Explain.InfoLevel level in the execution context.

Execution logging at level ALL can cause a significant slowdown in query execution speeds but the order of operations logged will be correct.

The logger used is called org.apache.jena.arq.exec. Message are sent at level "info". So for log4j, the following can be set in the log4j.properties file:

log4j.logger.org.apache.jena.arq.exec=INFO

The context setting is for key (Java constant) ARQ.symLogExec. To set globally:

ARQ.setExecutionLogging(Explain.InfoLevel.ALL) ;

and it may also be set on an individual query execution using it's local context.

 try(QueryExecutiuon qExec = QueryExecutionFactory.create(...)) {
     qExec.getContext().set(ARQ.symLogExec, Explain.InfoLevel.ALL) ;
     ResultSet rs = qExec.execSelect() ;
     ...
 }

On the command line:

 arq.query --explain --data data file --query=queryfile

The command tdbquery takes the same --explain argument.

Logging information levels: see the logging page