Lateral joins using the keyword LATERAL
were introduced in Apache Jena 4.7.0.
A LATERAL
join is like a foreach loop, looping on the results from the
left-hand side (LHS), the pattern before the LATERAL
keyword, and executing
the right-hand side (RHS) query pattern once for each row, with the variables
from the input LHS in-scope during each RHS evaluation.
A regular join only executes the RHS once, and the variables from the LHS are used for the join condition after evaluation of the left and right sub-patterns.
Another way to think of a lateral join is as a flatmap
.
Examples:
## Get exactly one label for each subject with type `:T`
SELECT * {
?s rdf:type :T
LATERAL {
SELECT * { ?s rdfs:label ?label } LIMIT 1
}
}
## Get zero or one labels for each subject.
SELECT * {
?s ?p ?o
LATERAL { OPTIONAL { SELECT * ?s rdfs:label ?label } LIMIT 1}
}
}
The LATERAL
keyword which takes the graph pattern so far in the group, from
the {
starting of the current block, and a { }
block afterwards.
Substituting variables from the LHS into the RHS (with the same restrictions), then executing the pattern, gives the evaluation of LATERAL
.
There needs to be a new syntax restriction: there can no variable introduced by
AS
(BIND
, or sub-query) or VALUES
in-scope at the top level of the
LATERAL
RHS, that is the same name as any
in-scope variable from
the LHS.
Such a variable assignment would conflict with the variable being set in variables of the row being joined.
## ** Illegal **
SELECT * {
?s ?p ?o
LATERAL { BIND( 123 AS ?o) }
}
In ARQ, LET would work.
LET
for a variable that is bound acts like a filter.
In looping on the input, a lateral join makes the bindings of variables in the current row available to the right hand side pattern, setting their value from the top down.
In SPARQL, it is possible to have variables of the same name which are not exposed within a sub-select. These are not lateral-joined to a variable of the same name from the LHS.
This is not speciifc to lateral joins. In
SELECT * {
?s rdf:type :T
{ SELECT ?label { ?s rdfs:label ?label } }
the inner ?s
can be replaced by ?z
without changing the results because the
inner ?s
is not joined to the outer ?s
but instead is hidden by the SELECT ?label
.
SELECT * {
?s rdf:type :T
{ SELECT ?label { ?z rdfs:label ?label } }
The same rule applies to lateral joins.
SELECT * {
?s rdf:type :T
LATERAL {
SELECT ?label { ?s rdfs:label ?label } LIMIT 1
}
}
The inner ?s
in the SELECT ?label
is not the outer ?s
because the SELECT ?label
does not pass out ?s
. As a sub-query the ?s
could be any name except
?label
for the same results.
There is a similarity to filter NOT EXISTS
/EXISTS
expressed as the non-legal
FILTER ( ASK { pattern } )
where the variables of the row being filtered are
available to “pattern”. This is similar to an SQL
correlated subquery.
LATERAL
is added to the SPARQL grammar at rule [[56] GraphPatternNotTriples](https://www.w3.org/TR/sparql11-query/#rGraphPatternNotTriples)
. As a syntax form, it is similar to OPTIONAL
.
[56] GraphPatternNotTriples ::= GroupOrUnionGraphPattern | OptionalGraphPattern | LateralGraphPattern | ...
[57] OptionalGraphPattern ::= 'OPTIONAL' GroupGraphPattern
[ ] LateralGraphPattern ::= 'LATERAL' GroupGraphPattern
The new algebra operator is lateral
which takes two expressions
SELECT * {
?s ?p ?o
LATERAL
{ ?a ?b ?c }
}
is translated to:
(lateral
(bgp (triple ?s ?p ?o))
(bgp (triple ?a ?b ?c)))
To evaluate lateral
:
Outline:
Definition: Lateral
Let Ω be a multiset of solution mappings. We define:
Lateral(Ω, P) = { μ | union of Ω1 where
foreach μ1 in Ω:
pattern2 = inject(pattern, μ1)
Ω1 = eval(D(G), pattern2)
result Ω1
}
where inject
is the corrected substitute
operation.
An alternative style is to define Lateral more like “evaluate P such that μ is
in-scope” in some way, rather than rely on inject
which is a mechanism.
Definition: Evaluation of Lateral
eval(D(G), Lateral(P1, P2) = Lateral(eval(D(G), P1), P2)