Legacy Documentation : not up-to-date
The original ARQ parser will be removed from Jena.
ARP can be used both as a Jena subsystem, or as a standalone RDF/XML parser. This document gives a quick guide to using ARP standalone.
Overview
To load an RDF file:
- Create an ARP instance.
- Set parse options, particularly error detection control, using getOptions or setOptionsWith.
- Set its handlers, by calling the
getHandlers
or
setHandlersWith
methods, and then.
- Setting the statement handler.
- Optionally setting the other handlers.
- Call a load method
Xerces is used for parsing the XML. The SAXEvents generated by Xerces are then analysed as RDF by ARP. It is possible to use a different source of SAX events.
Errors may occur in either the XML or the RDF part.
Sample Code
ARP arp = new ARP();
// initialisation - uses ARPConfig interface only.
arp.getOptions().setLaxErrorMode();
arp.getHandlers().setErrorHandler(new ErrorHandler(){
public void fatalError(SAXParseException e){
// TODO code
}
public void error(SAXParseException e){
// TODO code
}
public void warning(SAXParseException e){
// TODO code
}
});
arp.getHandlers().setStatementHandler(new StatementHandler(){
public void statement(AResource a, AResource b, ALiteral l){
// TODO code
}
public void statement(AResource a, AResource b, AResource l){
// TODO code
}
});
// parsing.
try {
// Loading fixed input ...
arp.load(new StringReader(
"<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>\n"
+"<rdf:Description><rdf:value rdf:parseType='Literal'>"
+"<b>hello</b></rdf:value>\n"
+"</rdf:Description></rdf:RDF>"
));
}
catch (IOException ioe){
// something unexpected went wrong
}
catch (SAXParseException s){
// This error will have been reported
}
catch (SAXException ss) {
// This error will not have been reported.
}
ARP Event Handling
ARP reports events concerning:
- Triples found in the input.
- Errors in the input.
- Namespace declarations.
- Scope of blank nodes.
User code is needed to respond to any of these events of interest. This is written by implementing any of the relevant interfaces: StatementHandler, org.xml.sax.ErrorHandler, NamespaceHandler, and ExtendedHandler.
An individual handler is set by calling the getHandlers method on the ARP instance. This returns an encapsulation of all the handlers being used. A specific handler is set by calling the appropriate set…Handler method on that object, e.g. setStatementHandler.
All the handlers can be copied from one ARP instance to another by using the setHandlersWith method:
ARP from, to;
// initialize from and to
// ...
to.setHandlersWith(from.getHandlers());
The error handler reports both XML and RDF errors, the former detected by Xerces. See ARPHandlers.setErrorHandler for details of how to distinguish between them.
Configuring ARP
ARP can be configured to treat most error conditions as warnings or to be ignored, and to treat some non-error conditions as warnings or errors.
In addition, the behaviour in response to input that does not have
an <rdf:RDF>
root element is configurable: either to treat the
whole file as RDF anyway, or to scan the file looking for embedded
<rdf:RDF>
elements.
As with the handlers, there is an options object that encapsulates
these settings. It can be accessed using
getOptions
,
and then individual settings can be made using the methods in
ARPOptions
.
It is also possible to copy all the option settings from one ARP instance to another:
ARP from, to;
// initialize from and to ...
to.setOptionsWith(from.getOptions());
The I/O how-to gives some more
detail about the options settings, although it assumes the use of
the Jena RDFReader
interface.
Interrupting ARP
It is possible to interrupt an ARP thread. See the I/O how-to for details.
Using Other SAX Sources
It is possible to use ARP with other SAX input sources, e.g. from a non-Xerces parser, or from an in-memory XML source, such as a DOM tree.
Instead of an ARP instance, you create an instance of SAX2RDF using the newInstance method. This can be configured just like an ARP instance, following the initialization section of the sample code.
This is used like a SAX2Model instance as described elsewhere.
Memory usage
For very large files, ARP does not use any additional memory except
when either the
ExtendedHandler.discardNodesWithNodeID
returns false or when the
AResource.setUserData
method has been used. In these cases ARP needs to remember the
rdf:nodeID
usage through the file life time.