ARQ supports writing custom SPARQL functions in JavaScript. These functions can be used in FILTERs and for calculating values to assign with AS in BIND and SELECT expressions.
XSD datatypes for strings, numbers and booleans are converted to the
native JavaScript datatypes. RDFterms that do not fit easily into
JavaScript datatypes are handled with a object class NV
.
Applications should be aware that there are risks in exposing a script engine with full computational capabilities through SPARQL. Script functions are only as secure as the script engine environment they run in.
Requirements
ARQ requires a javascript engine such as GraalVM to be added to the classpath.
<properties>
<ver.graalvm>....</ver.graalvm>
...
<dependency>
<groupId>org.graalvm.js</groupId>
<artifactId>js</artifactId>
<version>${ver.graalvm}</version>
</dependency>
<dependency>
<groupId>org.graalvm.js</groupId>
<artifactId>js-scriptengine</artifactId>
<version>${ver.graalvm}</version>
</dependency>
Enabling and Loading JavaScript functions
JavaScript is loaded from an external file using the context setting
“http://jena.apache.org/ARQ#js-library". This can be written as
arq:js-library
for commands and Fuseki configuration files.
Access to the script engine must be enabled at runtime.
The Java system property to do this is jena:scripting
.
Example:
export JVM_ARGS=-Djena:scripting=true
sparql --set arq:js-library=SomeFile.js --data ... --query ...
and for MS Windows:
set JVM_ARGS=-Djena:scripting=true
sparql --set arq:js-library=SomeFile.js --data ... --query ...
will execute on the data with the JavaScript functions from file “SomeFile.js” available.
JavaScript functions can also be set from a string directly from within Java using constant
ARQ.symJavaScriptFunctions
(“http://jena.apache.org/ARQ#js-functions").
WARNING: Enabling this feature exposes the majority of the underlying scripting engine directly to SPARQL queries so may provide a vector for arbitrary code execution. Therefore it is recommended that this feature remain disabled for any publicly accessible deployment that utilises the ARQ query engine.
Identifying callable functions
The context setting ““http://jena.apache.org/ARQ#scriptAllowList" is used to provide a comma-separated list of function names, which are the local part of the URI, that are allowed to be called as custom script functions.
This can be written as arq:scriptAllowList
for commands and Fuseki configuration files.
It is the java constant ARQ.symCustomFunctionScriptAllowList
sparql --set arq:js-library=SomeFile.js \
--set arq:scriptAllowList=toCamelCase,anotherFunction
--data ... --query ...
and a query of:
PREFIX js: <http://jena.apache.org/ARQ/jsFunction#>
SELECT ?input (js:toCamelCase(?input) AS ?X)
{
VALUES ?input { "some woRDs to PROCESS" }
}
Using JavaScript functions
SPARQL functions implemented in JavaScript are automatically called when a URI starting “http://jena.apache.org/ARQ/jsFunction#" used.
This can conveniently be abbreviated by:
PREFIX js: <http://jena.apache.org/ARQ/jsFunction#>
Arguments and Function Results
xsd:string
(a string with no language tag), any XSD numbers (integer,
decimal, float, double and all the derived types) and xsd:boolean
are
converted to JavaScript string, number and boolean respectively.
SPARQL functions must return a value. When a function returns a value,
it can be one of these JavaScript native datatypes, in which case the
reverse conversion is applied back to XSD datatypes. For numbers, the
conversion is back to xsd:integer
(if it has no fractional part) or
xsd:double
.
The JavaScript function can also create NodeValue
(or NV
) objects
for other datatypes by calling Java from within the JavaScript script
engine of the Java runtime.
URIs are passed as NV
object and are available in JavaScript as a string.
The class NV
is used for all other RDF terms.
Returning JavaScript null
is the error indicator and a SPARQL
expression error (ExprEvalException
) is raised, like any other
expression error in SPARQL. That, in turn, will cause the whole
expression the function is part of to evaluate to an error (unless
a special form like COALESCE
is used). In a FILTER
that typically
makes the filter evaluate to “false”.
Example
Suppose “functions.js” contains code to camel case words in a string. For example, “some words to process " becomes “someWordsToProcess”.
// CamelCase a string
// Words to be combined are separated by a space in the string.
function toCamelCase(str) {
return str
.split(' ')
.map(cc)
.join('');
}
function ucFirst(word) {
return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
}
function lcFirst(word) {
return word.toLowerCase();
}
function cc(word,index) {
return (index == 0) ? lcFirst(word) : ucFirst(word);
}
and the query Q.rq
PREFIX js: <http://jena.apache.org/ARQ/jsFunction#>
SELECT ?input (js:toCamelCase(?input) AS ?X)
{
VALUES ?input { "some woRDs to PROCESS" }
}
which results in:
--------------------------------------------------
| input | X |
==================================================
| "some woRDs to PROCESS" | "someWordsToProcess" |
--------------------------------------------------
Use with Fuseki
The context setting can be provided on the command line starting the server, for example:
export JVM_ARGS=-Djena:scripting=true
fuseki --set arq:js-library=functions.js \
--set arq:scriptAllowList=toCamelCase \
--mem /ds
or it can be specified in the server configuration file config.ttl
:
PREFIX : <#>
PREFIX fuseki: <http://jena.apache.org/fuseki#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ja: <http://jena.hpl.hp.com/2005/11/Assembler#>
[] rdf:type fuseki:Server ;
# Set the server-wide context
ja:context [
ja:cxtName "arq:js-library" ;
ja:cxtValue "/filepath/functions.js"
] ;
ja:context [
ja:cxtName "arq:scriptAllowList" ;
ja:cxtValue "toCamelCase"
] ;
.
<#service> rdf:type fuseki:Service;
rdfs:label "Dataset";
fuseki:name "ds";
fuseki:serviceQuery "sparql";
fuseki:dataset <#dataset> ;
.
<#dataset> rdf:type ja:DatasetTxnMem;
ja:data <file:D.trig>;
.
and used as:
export JVM_ARGS=-Djena:scripting=true
fuseki --conf config.ttl