Jena schemagen HOWTO

The schemagen provided with Jena is used to convert an OWL or RDFS vocabulary into a Java class file that contains static constants for the terms in the vocabulary. This documents outlines the use of schemagen, and the various options and templates that may be used to control the output.

Schemagen is typically invoked from the command line or from a built script (such as Ant). Synopsis of the command:

java jena.schemagen -i <input> [-a <namespaceURI>] [-o <output file>] [-c <config uri>] [-e <encoding>] ...

Schemagen is highly configurable, either with command line options or by RDF information read from a configuration file. Many other options are defined, and these are described in detail below. Note that the CLASSPATH environment variable must be set to include the Jena .jar libraries.

Summary of configuration options

For quick reference, here is a list of all of the schemagen options (both command line and configuration file). The use of these options is explained in detail below.

Table 1: schemagen options

Command line option RDF config file property Meaning
-a <uri> sgen:namespace The namespace URI for the vocabulary. Names with this URI as prefix are automatically included in the generated vocabulary. If not specified, the base URI of the ontology is used as a default (but note that some ontology documents don’t define a base URI).
-c <filename>
-c <url>
Specify an alternative config file.
--classdec <string> sgen:classdec Additional decoration for class header (such as implements)
--classnamesuffix <string> sgen:classnamesuffix Option for adding a suffix to the generated class name, e.g. “Vocab”.
--classSection <string> sgen:classSection Section declaration comment for class section.
--classTemplate <string> sgen:classTemplate Template for writing out declarations of class resources.
--datatypesSection <string> sgen:datatypesSection Section declaration comment for datatypes section.
--datatypeTemplate <string> sgen:datatypeTemplate Template for writing out declarations of datatypes.
--declarations <string> sgen:declarations Additional declarations to add at the top of the class.
--dos sgen:dos Use MSDOS-style line endings (i.e. \r\n). Default is Unix-style line endings.
-e <string> sgen:encoding The surface syntax of the input file (e.g. RDF/XML, N3). Defaults to RDF/XML.
--footer <string> sgen:footer Template for standard text to add to the end of the file.
--header <string> sgen:header Template for the file header, including the class comment.
-i <filename>
-i <url>
sgen:input Specify the input document to load
--include <uri> sgen:include Option for including non-local URI’s in vocabulary
--individualsSection <string> sgen:individualsSection Section declaration comment for individuals section.
--individualTemplate <string> sgen:individualTemplate Template for writing out declarations of individuals.
--inference sgen:inference Causes the model that loads the document prior to being processed to apply inference rules appropriate to the language. E.g. OWL inference rules will be used on a .owl file.
--marker <string> sgen:marker Specify the marker string for substitutions, default is ‘%’
-n <string> sgen:classname The name of the generated class. The default is to synthesise a name based on input document name.
--noclasses sgen:noclasses Option to suppress classes in the generated vocabulary file
--nocomments sgen:noComments Turn off all comment output in the generated vocabulary
--nodatatypes sgen:nodatatypes Option to suppress datatypes in the generated vocabulary file.
--noheader sgen:noHeader Prevent the output of a file header, with class comment etc.
--noindividuals sgen:noindividuals Option to suppress individuals in the generated vocabulary file.
--noproperties sgen:noproperties Option to suppress properties in the generated vocabulary file.
-o <filename>
-o <dir>
sgen:output Specify the destination for the output. If the given value evaluates to a directory, the generated class will be placed in that directory with a file name formed from the generated (or given) class name with “.java” appended.
--nostrict sgen:noStrict Option to turn off strict checking for ontology classes and properties (prevents ConversionExceptions).
--ontology sgen:ontology The generated vocabulary will use the ontology API terms, in preference to RDF model API terms.
--owl sgen:owl Specify that the language of the source is OWL (the default). Note that RDFS is a subset of OWL, so this setting also suffices for RDFS.
--package <string> sgen:package Specify the Java package name and directory.
--propSection <string> sgen:propSection Section declaration comment for properties section.
--propTemplate <string> sgen:propTemplate Template for writing out declarations of property resources.
-r <uri> Specify the uri of the root node in the RDF configuration model.
--rdfs sgen:rdfs Specify that the language of the source ontology is RDFS.
--strictIndividuals sgen:strictIndividuals When selecting the individuals to include in the output class, schemagen will normally include those individuals whose rdf:type is in the included namespaces for the vocabulary. However, if strictIndividuals is turned on, then all individuals in the output class must themselves have a URI in the included namespaces.
--uppercase sgen:uppercase Option for mapping constant names to uppercase (like Java constants). Default is to leave the case of names unchanged.
--includeSource sgen:includeSource Serializes the source code of the vocabulary, and includes this into the generated class file. At class load time, creates a Model containing the definitions from the source

What does schemagen do?

RDFS and OWL provide a very convenient means to define a controlled vocabulary or ontology. For general ontology processing, Jena provides various API’s to allow the source files to be read in and manipulated. However, when developing an application, it is frequently convenient to refer to the controlled vocabulary terms directly from Java code. This leads typically to the declaration of constants, such as:

    public static final Resource A_CLASS = new ResourceImpl( "http://example.org/schemas#a-class" );

When these constants are defined manually, it is tedious and error-prone to maintain them in sync with the source ontology file. Schemagen automates the production of Java constants that correspond to terms in an ontology document. By automating the step from source vocabulary to Java constants, a source of error and inconsistency is removed.

Example

Perhaps the easiest way to explain the detail of what schemagen does is to show an example. Consider the following mini-RDF vocabulary:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
            xmlns="http://example.org/eg#"
         xml:base="http://example.org/eg">
  <rdfs:Class rdf:ID="Dog">
      <rdfs:comment>A class of canine companions</rdfs:comment>
  </rdfs:Class>
  <rdf:Property rdf:ID="petName">
      <rdfs:comment>The name that everyone calls a dog</rdfs:comment>
      <rdfs:domain rdf:resource="http://example.org/eg#Dog" />
  </rdf:Property>
  <rdf:Property rdf:ID="kennelName">
      <rdfs:comment>Posh dogs have a formal name on their KC certificate</rdfs:comment>
  </rdf:Property>
  <Dog rdf:ID="deputy">
      <rdfs:comment>Deputy is a particular Dog</rdfs:comment>
      <kennelName>Deputy Dawg of Chilcompton</kennelName>
  </Dog>
</rdf:RDF>

We process this document with a command something like: Java jena.schemagen -i deputy.rdf -a http://example.org/eg# to produce the following generated class:

/* CVS $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $ */

import org.apache.jena.rdf.model.*;

/**
 * Vocabulary definitions from deputy.rdf
 * @author Auto-generated by schemagen on 01 May 2003 21:49
 */
public class Deputy {
    /** <p>The RDF model that holds the vocabulary terms</p> */
    private static Model m_model = ModelFactory.createDefaultModel();
    
    /** <p>The namespace of the vocabulary as a string {@value}</p> */
    public static final String NS = "http://example.org/eg#";
    
    /** <p>The namespace of the vocabulary as a resource {@value}</p> */
    public static final Resource NAMESPACE = m_model.createResource( "http://example.org/eg#" );
    
    /** <p>The name that everyone calls a dog</p> */
    public static final Property petName = m_model.createProperty( "http://example.org/eg#petName" );
    
    /** <p>Posh dogs have a formal name on their KC certificate</p> */
    public static final Property kennelName = m_model.createProperty( "http://example.org/eg#kennelName" );
    
    /** <p>A class of canine companions</p> */
    public static final Resource Dog = m_model.createResource( "http://example.org/eg#Dog" );
    
    /** <p>Deputy is a particular Dog</p> */
    public static final Resource deputy = m_model.createResource( "http://example.org/eg#deputy" );
    
}

Some things to note in this example. All of the named classes, properties and individuals from the source document are translated to Java constants (below we show how to be more selective than this). The properties of the named resources are not translated: schemagen is for giving access to the names in the vocabulary or schema, not to perform a general translation of RDF to Java. The RDFS comments from the source code are translated to Javadoc comments. Finally, we no longer directly call new ResourceImpl: this idiom is no longer recommended by the Jena team.

We noted earlier that schemagen is highly configurable. One additional argument generates a vocabulary file that uses Jena’s ontology API, rather than the RDF model API. We change rdfs:Class to owl:Class, and invoke Java jena.schemagen -i deputy.rdf -b http://example.org/eg# --ontology to get:

/* CVs $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $ */

import org.apache.jena.rdf.model.*;
import org.apache.jena.ontology.*;
/**
 * Vocabulary definitions from deputy.rdf
 * @author Auto-generated by schemagen on 01 May 2003 22:03
 */
public class Deputy {
    /** <p>The ontology model that holds the vocabulary terms</p> */
    private static OntModel m_model = ModelFactory.createOntologyModel( ProfileRegistry.OWL_LANG );
    
    /** <p>The namespace of the vocabulary as a string {@value}</p> */
    public static final String NS = "http://example.org/eg#";
    
    /** <p>The namespace of the vocabulary as a resource {@value}</p> */
    public static final Resource NAMESPACE = m_model.createResource( "http://example.org/eg#" );
    
    /** <p>The name that everyone calls a dog</p> */
    public static final Property petName = m_model.createProperty( "http://example.org/eg#petName" );
    
    /** <p>Posh dogs have a formal name on their KC certificate</p> */
    public static final Property kennelName = m_model.createProperty( "http://example.org/eg#kennelName" );
    
    /** <p>A class of canine companions</p> */
    public static final OntClass Dog = m_model.createClass( "http://example.org/eg#Dog" );
    
    /** <p>Deputy is a particular Dog</p> */
    public static final Individual deputy = m_model.createIndividual( Dog, "http://example.org/eg#deputy" );
    
}

General principles

In essence, schemagen will load a single vocabulary file, and generate a Java class that contains static constants for the named classes, properties and instances of the vocabulary. Most of the generated components of the output Java file can be controlled by option flags, and formatted with a template. Default templates are provided for all elements, so the minimum amount of necessary information is actually very small.

Options can be specified on the command line (when invoking schemagen), or may be preset in an RDF file. Any mixture of command line and RDF option specification is permitted. Where a given option is specified both in an RDF file and on the command line, the command line setting takes precedence. Thus the options in the RDF file can be seen as defaults.

Specifying command line options

To specify a command line option, add its name (and optional value) to the command line when invoking the schemagen tool. E.g: Java jena.schemagen -i myvocab.owl --ontology --uppercase

Specifying options in an RDF file

To specify an option in an RDF file, create a resource of type sgen:Config, with properties corresponding to the option names listed in Table 1. The following fragment shows a small options file. A complete example configuration file is shown in appendix A.

By default, schemagen will look for a configuration file named schemagen.rdf in the current directory. To specify another configuration, use the -c option with a URL to reference the configuration. Multiple configurations (i.e. multiple sgen:Config nodes) can be placed in one RDF document. In this case, each configuration node must be named, and the URI specified in the -r command line option. If there is no -r option, schemagen will look for a node of type rdf:type sgen:Config. If there are multiple such nodes in the model, it is indeterminate which one will be used.

Using templates

We have several times referred to a template being used to construct part of the generated file. What is a template? Simply put, it is a fragment of output file. Some templates will be used at most once (for example the file header template), some will be used many times (such as the template used to generate a class constant). In order to make the templates adaptable to the job they’re doing, before it is written out a template has keyword substitution performed on it. This looks for certain keywords delimited by a pair of special characters (% by default), and replaces them with the current binding for that keyword. Some keyword bindings stay the same throughout the processing of the file, and some are dependent on the language element being processed. The substitutions are:

Table 2: Substitutable keywords in templates

Keyword Meaning Typical value
classname The name of the Java class being generated Automatically defined from the document name, or given with the -n option
date The date and time the class was generated
imports The Java imports for this class
nl The newline character for the current platform
package The Java package name As specified by an option. The option just gives the package name, schemagen turns the name into a legal Java statement.
sourceURI The source of the document being processed As given by the -i option or in the config file.
valclass The Java class of the value being defined E.g. Property for vocabulary properties, Resource for classes in RDFS, or OntClass for classes using the ontology API
valcreator The method used to generate an instance of the Java representation E.g. createResource or createClass
valname The name of the Java constant being generated This is generated from the name of the resource in the source file, adjusted to be a legal Java identifier. By default, this will preserve the case of the RDF constant, but setting --uppercase will map all constants to upper-case names (a common convention in Java code).
valtype The rdf:type for an individual The class name or URI used when creating an individual in the ontology API
valuri The full URI of the value being defined From the RDF, without adjustment.

Details of schemagen options

We now go through each of the configuration options in detail.

Note: for brevity, we assume a standard prefix sgen is defined for resource URI’s in the schemagen namespace. The expansion for sgen is: http://jena.hpl.hp.com/2003/04/schemagen#, thus:

xmlns:sgen="http://jena.hpl.hp.com/2003/04/schemagen#"

Schemagen will attempt to ensure that all generated code will compile as legal Java. Occasionally, this means that identifiers from input documents, which are legal components of RDF URI identifiers, have to be modified to be legal Java identifiers. Specifically, any character in an identifier name that is not a legal Java identifier character will be replaced with the character ‘_’ (underscore). Thus the name ‘trading-price’ might become 'trading_price’. In addition, Java requires that identifiers be distinct. If a name clash is detected (for example, trading-price and trading+price both map to the same Java identifier), schemagen will add disambiguators to the second and subsequent uses. These will be based on the role of the identifier; for example property names are disambiguated by appending _PROPn for increasing values of n. In a well-written ontology, identifiers are typically made distinct for clarity and ease-of-use by the ontology users, so the use of the disambiguation tactic is rare. Indeed, it may be taken as a hint that refactoring the ontology itself is desirable.

Specifying the configuration file

Command line -c <*config-file-path*>
-c <*config-file-URL*>
Config file n/a

The default configuration file name is schemagen.rdf in the current directory. To specify a different configuration file, either as a file name on the local file system, or as a URL (e.g. an http: address), the config file location is passed with the -c option. If no -c option is given, and there is no configuration file in the current directory, schemagen will continue and use default values (plus the other command line options) to configure the tool. If a file name or URL is given with -c, and that file cannot be located, schemagen will stop with an error.

Schemagen will assume the language encoding of the configuration file is implied by the filename/URL suffix: “.n3” means N3, “.nt” means NTRIPLES, “.rdf” and “.owl” mean “RDF/XML”. By default it assumes RDF/XML.

Specifying the configuration root in the configuration file

Command line -r <*config-root-URI*>
Config file n/a

It is possible to have more than one set of configuration options in one configuration file. If there is only one set of configuration options, schemagen will locate the root by searching for a resource of rdf:type sgen:Config. If there is more than one, and no root is specified on the command line, it is not specified which set of configuration options will be used. The root URI given as a command line option must match exactly with the URI given in the configuration file. For example:

Java jena.schemagen -c config/localconf.rdf -r http://example.org/sg#project1

matches:

...
 <sgen:Config rdf:about="http://example.org/SG#project1">
   ....
 </sgen:Config>

Specifying the input document

Command line -i <*input-file-path*>
-i <*input-URL*>
Config file <sgen:input rdf:resource="*inputURL*" />

The only mandatory argument to schemagen is the input document to process. This can be specified in the configuration file, though this does, of course, mean that the same configuration cannot be applied to multiple different input files for consistency. However, by specifying the input document in the default configuration file, schemagen can easily be invoked with the minimum of command line typing. For other means of automating schemagen, see using schemagen with Ant.

Specifying the output location

Command line -o <*input-file-path*>
-o <*output-dir*>
Config file <sgen:output rdf:datatype="&xsd;string">*output-path-or-dir*</sgen:output>

Schemagen must know where to write the generated Java file. By default, the output is written to the standard output. Various options exist to change this. The output location can be specified either on the command line, or in the configuration file. If specified in the configuration file, the resource must be a string literal, denoting the file path. If the path given resolves to an existing directory, then it is assumed that the output will be based on the name of the generated class (i.e. it will be the class name with Java appended). Otherwise, the path is assumed to point to a file. Any existing file that has the given path name will be overwritten.

By default, schemagen will create files that have the Unix convention for line-endings (i.e. ‘\n’). To switch to DOS-style line endings, use --dos.

Command line --dos
Config file <sgen:dos rdf:datatype="&xsd;boolean">true</sgen:dos>

Specifying the class name

Command line -n <*class-name*>
Config file <sgen:classname rdf:datatype="&xsd;string">*classname*</sgen:classname>

By default, the name of the class will be based on the name of the input file. Specifically, the last component of the input document’s path name, with the prefix removed, becomes the class name. By default, the initial letter is adjusted to a capital to conform to standard Java usage. Thus file:vocabs/trading.owl becomes Trading.java. To override this default algorithm, a class name specified by -n or in the config file is used exactly as given.

Sometimes it is convenient to have all vocabulary files distinguished by a common suffix, for example xyzSchema.java or xyzVocabs.java. This can be achieved by the classname-suffix option:

Command line --classnamesuffix <*suffix*>
Config file <sgen:classnamesuffix rdf:datatype="&xsd;string">*suffix*</sgen:classnamesuffix>

See also the note on legal Java identifiers, which applies to generated class names.

Specifying the vocabulary namespace

Command line -a <*namespace-URI*>
Config file <sgen:namespace rdf:datatype="&xsd;string">*namespace*</sgen:namespace>

Since ontology files are often modularised, it is not the case that all of the resource names appearing in a given document are being defined by that ontology. They may appear simply as part of the definitions of other terms. Schemagen assumes that there is one primary namespace for each document, and it is names from that namespace that will appear in the generated Java file.

In an OWL ontology, this namespace is computed by finding the owl:Ontology element, and using its namespace as the primary namespace of the ontology. This may not be available (it is not, for example, a part of RDFS) or correct, so the namespace may be specified directly with the -a option or in the configuration file.

Schemagen does not, in the present version, permit more than one primary namespace per generated Java class. However, constants from namespaces other than the primary namespace may be included in the generated Java class by the include option:

Command line --include <*namespace-URI*>
Config file <sgen:include rdf:datatype="&xsd;string">*namespace*</sgen:include>

The include option may repeated multiple times to include a variety of constants from other namespaces in the output class.

Since OWL and RDFS ontologies may include individuals that are named instances of declared classes, schemagen will include individuals among the constants that it generates in Java. By default, an individual will be included if its class has a URI that is in one of the permitted namespaces for the vocabulary, even if the individual itself is not in that namespace. If the option strictIndividuals is set, individuals are only included if they have a URI that is in the permitted namespaces for the vocabulary.

Command line --strictIndividuals
Config file <sgen:strictIndividuals />

Specifying the syntax (encoding) of the input document

Command line -e <*encoding*>
Config file <sgen:encoding rdf:datatype="&xsd;string">*encoding*</sgen:encoding>

Jena can parse a number of different presentation syntaxes for RDF documents, including RDF/XML, N3 and NTRIPLE. By default, the encoding will be derived from the name of the input document (e.g. a document xyz.n3 will be parsed in N3 format), or, if the extension is non-obvious the default is RDF/XML. The encoding, and hence the parser, to use on the input document may be specified by the encoding configuration option.

Choosing the style of the generated class: ontology or plain RDF

Command line --ontology
Config file <sgen:ontology rdf:datatype="&xsd;boolean">*true or false*</sgen:ontology>

By default, the Java class generated by schemagen will generate constants that are plain RDF Resource, Property or Literal constants. When working with OWL or RDFS ontologies, it may be more convenient to have constants that are OntClass, ObjectProperty, DatatypeProperty and Individual Java objects. To generate these ontology constants, rather than plain RDF constants, set the ontology configuration option.

Furthermore, since Jena can handle input ontologies in OWL (the default), and RDFS, it is necessary to be able to specify which language is being processed. This will affect both the parsing of the input documents, and the language profile selected for the constants in the generated Java class.

Command line --owl
Config file <sgen:owl rdf:datatype="&xsd;boolean">true</sgen:owl>
Command line --rdfs
Config file <sgen:rdfs rdf:datatype="&xsd;boolean">true</sgen:owl>

Prior to Jena 2.2, schemagen used a Jena model to load the input document that also applied some rules of inference to the input data. So, for example, a resource that is mentioned as the owl:range of a property can be inferred to be rdf:type owl:Class, and hence listed in the class constants in the generated Java class, even if that fact is not directly asserted in the input model. From Jena 2.2 onwards, this option is now off by default. If correct handling of an input document by schemagen requires the use of inference rules, this must be specified by the inference option.

Command line --inference
Config file <sgen:inference rdf:datatype="&xsd;boolean">true</sgen:owl>

Specifying the Java package

Command line --package <*package-name*>
Config file <sgen:package rdf:datatype="&xsd;string">*package-name*</sgen:package>

By default, the Java class generated by schemagen will not be in a Java package. Set the package configuration option to specify the Java package name. Change from Jena 2.6.4-SNAPSHOT onwards: Setting the package name will affect the directory into which the generated class will be written: directories will be appended to the output directory to match the Java package.

Additional decorations on the main class declaration

Command line --classdec <*class-declaration*>
Config file <sgen:classdec rdf:datatype="&xsd;string">*class-declaration*</sgen:classdec>

In some applications, it may be convenient to add additional information to the declaration of the Java class, for example that the class implements a given interface (such as java.lang.Serializable). Any string given as the value of the class-declaration option will be written immediately after “public class <i>ClassName</i>”.

Adding general declarations within the generated class

Command line --declarations <*declarations*>
Config file <sgen:declarations rdf:datatype="&xsd;string">*declarations*</sgen:declarations>

Some more complex vocabularies may require access to static constants, or other Java objects or factories to fully declare the constants defined by the given templates. Any text given by the declarations option will be included in the generated class after the class declaration but before the body of the declared constants. The value of the option should be fully legal Java code (though the template substitutions will be performed on the code). Although this option can be declared as a command line option, it is typically easier to specify as a value in a configuration options file.

Omitting sections of the generated vocabulary

Command line --noclasses
--nodatatypes
--noproperties
--noindividuals
Config file <sgen:noclassses rdf:datatype="&xsd;boolean">true</sgen:noclassses>
<sgen:nodatatypes rdf:datatype="&xsd;boolean">true</sgen:nodatatypes>
<sgen:noproperties rdf:datatype="&xsd;boolean">true</sgen:noproperties>
<sgen:noindividuals rdf:datatype="&xsd;boolean">true</sgen:noindividuals>

By default, the vocabulary class generated from a given ontology will include constants for each of the included classes, datatypes, properties and individuals in the ontology. To omit any of these groups, use the corresponding noXYZ configuration option. For example, specifying --noproperties means that the generated class will not contain any constants corresponding to predicate names from the ontology, irrespective of what is in the input document.

Section header comments

Command line --classSection *<section heading>*
--datatypeSection *<section heading>*
--propSection *<section heading>*
--individualSection *<section heading*>
--header *<file header section>*
--footer *<file footer section>*
Config file <sgen:classSection rdf:datatype="&xsd;string">*section heading*</sgen:classSection>
<sgen:datatypeSection rdf:datatype="&xsd;string">*section heading*</sgen:datatypeSection>
<sgen:propSection rdf:datatype="&xsd;string">*section heading*</sgen:propSection>
<sgen:individualSection rdf:datatype="&xsd;string">*section heading*</sgen:individualSection>
<sgen:header rdf:datatype="&xsd;string">*file header*</sgen:header>
<sgen:footer rdf:datatype="&xsd;string">*file footer*</sgen:footer>

Some coding styles use block comments to delineate different sections of a class. These options allow the introduction of arbitrary Java code, though typically this will be a comment block, at the head of the sections of class constant declarations, datatype constant declarations, property constant declarations, and individual constant declarations.

Include vocabulary source code

Command line --includeSource
Config file \<sgen:includeSource rdf:datatype="&xsd;boolean">true\</sgen:includeSource>

Schemagen’s primary role is to provide Java constants corresponding to the names in a vocabulary. Sometimes, however, we may need more information from the vocabulary source file to available. For example, to know the domain and range of the properties in the vocabulary. If you set the configuration parameter --includeSource, schemagen will:

  • convert the input vocabulary into string form and include that string form in the generated Java class
  • create a Jena model when the Java vocabulary class is first loaded, and load the string-ified vocabulary into that model
  • attach the generated constants to that model, so that, for example, you can look up the declared domain and range of a property or the declared super-classes of a class.

Note that Java compilers typically impose some limit on the size of a Java source file (or, more specifically, on the size of .class file they will generate. Loading a particularly large vocabulary with --includeSource may risk breaching that limit.

Using schemagen with Maven

Apache Maven is a build automation tool typically used for Java. You can use exec-maven-plugin and build-helper-maven-plugin to run schemagen as part of the generate-sources goal of your project. The following example shows one way of performing this task. The developer should customize command-line options or use a configuration file instead as needed.

  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>exec-maven-plugin</artifactId>
        <executions>
          <execution>
            <phase>generate-sources</phase>
            <goals>
              <goal>java</goal>
            </goals>
            <configuration>
              <mainClass>jena.schemagen</mainClass>
              <commandlineArgs>
                --inference \
                -i ${basedir}/src/main/resources/example.ttl \
                -e TTL \
                --package org.example.ont \
                -o ${project.build.directory}/generated-sources/java \
                -n ExampleOnt
              </commandlineArgs>
            </configuration>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>build-helper-maven-plugin</artifactId>
        <executions>
          <execution>
            <id>add-source</id>
            <goals>
              <goal>add-source</goal>
            </goals>
            <configuration>
              <sources>
                <source>${project.build.directory}/generated-sources/java</source>
              </sources>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

At this point you can run mvn generate-sources in your project to cause schemagen to run and create your Java source (note that this goal is run automatically from mvn compile or mvn install, so there really isn’t any reason to run it manually unless you wish to just generate the source). The source file is placed in the maven standard target/generated-sources/java directory, which is added to the project classpath by build-helper-maven-plugin.

Using schemagen with Ant

Apache Ant is a tool for automating build steps in Java (and other language) projects. For example, it is the tool used to compile the Jena sources to the jena.jar file, and to prepare the Jena distribution prior to download. Although it would be quite possible to create an Ant taskdef to automate the production of Java classes from input vocabularies, we have not yet done this. Nevertheless, it is straightforward to use schemagen from an ant build script, by making use of Ant’s built-in Java task, which can execute an arbitrary Java program.

The following example shows a complete ant target definition for generating ExampleVocab.java from example.owl. It ensures that the generation step is only performed when example.owl has been updated more recently than ExampleVocab.java (e.g. if the definitions in the owl file have recently been changed).

  <!-- properties -->
  <property name="vocab.dir"       value="src/org/example/vocabulary" />
  <property name="vocab.template"  value="${rdf.dir}/exvocab.rdf" />
  <property name="vocab.tool"      value="jena.schemagen" />

  <!-- Section: vocabulary generation -->
  <target name="vocabularies" depends="exVocab" />

  <target name="exVocab.check">
    <uptodate
       property="exVocab.nobuild"
       srcFile="${rdf.dir}/example.owl"
       targetFile="${vocab.dir}/ExampleVocab.java" />
  </target>

  <target name="exVocab" depends="exVocab.check" unless="exVocab.nobuild">
    <Java classname="${vocab.tool}" classpathref="classpath" fork="yes">
      <arg value="-i" />
      <arg value="file:${rdf.dir}/example.owl" />
      <arg value="-c" />
      <arg value="${vocab.template}" />
      <arg value="--classnamesuffix" />
      <arg value="Vocab" />
      <arg value="--include" />
      <arg value="http://example.org/2004/01/services#" />
      <arg value="--ontology" />
    </Java>
  </target>

Clearly it is up to each developer to find the appropriate balance between options that are specified via the command line options, and those that are specified in the configuration options file (exvocab.rdf in the above example). This is not the only, nor necessarily the “right” way to use schemagen from Ant, but if it points readers in the appropriate direction to produce a custom target for their own application it will have served its purpose.

Appendix A: Complete example configuration file

The source of this example is provided in the Jena download as etc/schemagen.rdf. For clarity, RDF/XML text is highlighted in blue.

<?xml version='1.0'?>

<!DOCTYPE rdf:RDF [
    <!ENTITY jena    'http://jena.hpl.hp.com/'>

    <!ENTITY rdf     'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY rdfs    'http://www.w3.org/2000/01/rdf-schema#'>
    <!ENTITY owl     'http://www.w3.org/2002/07/owl#'>
    <!ENTITY xsd     'http://www.w3.org/2001/XMLSchema#'>
    <!ENTITY base    '&jena;2003/04/schemagen'>
    <!ENTITY sgen    '&base;#'>
]>

<rdf:RDF
  xmlns:rdf   ="&rdf;"
  xmlns:rdfs  ="&rdfs;"
  xmlns:owl   ="&owl;"
  xmlns:sgen  ="&sgen;"
  xmlns       ="&sgen;"
  xml:base    ="&base;"
>

<!--
    Example schemagen configuration for use with jena.schemagen
    Not all possible options are used in this example, see Javadoc and Howto for full details.

    Author: Ian Dickinson, mailto:ian.dickinson@hp.com
    CVs:    $Id: schemagen.html,v 1.16 2010-06-11 00:08:23 ian_dickinson Exp $
-->

<sgen:Config>
    <!-- specifies that the  source document uses OWL -->
    <sgen:owl rdf:datatype="&xsd;boolean">true</sgen:owl>

    <!-- specifies that we want the generated vocab to use OntClass, OntProperty, etc, not Resource and Property -->
    <sgen:ontology rdf:datatype="&xsd;boolean">true</sgen:ontology>

    <!-- specifies that we want names mapped to uppercase (as standard Java constants) -->
    <sgen:uppercase rdf:datatype="&xsd;boolean">true</sgen:uppercase>

    <!-- append Vocab to class name, so input beer.owl becomes BeerVocab.java -->
    <sgen:classnamesuffix rdf:datatype="&xsd;string">Vocab</sgen:classnamesuffix>

    <!-- the Java package that the vocabulary is in -->
    <sgen:package rdf:datatype="&xsd;string">com.example.vocabulary</sgen:package>

    <!-- the directory or file to write the results out to -->
    <sgen:output rdf:datatype="&xsd;string">src/com/example/vocabulary</sgen:output>

    <!-- the template for the file header -->
<sgen:header rdf:datatype="&xsd;string">/*****************************************************************************
 * Source code information
 * -----------------------
 * Original author    Jane Smart, example.com
 * Author email       jane.smart@example.com
 * Package            @package@
 * Web site           @website@
 * Created            %date%
 * Filename           $RCSfile: schemagen.html,v $
 * Revision           $Revision: 1.16 $
 * Release status     @releaseStatus@ $State: Exp $
 *
 * Last modified on   $Date: 2010-06-11 00:08:23 $
 *               by   $Author: ian_dickinson $
 *
 * @copyright@
 *****************************************************************************/


// Package
///////////////////////////////////////
%package%


// Imports
///////////////////////////////////////
%imports%



/**
 * Vocabulary definitions from %sourceURI%
 * @author Auto-generated by schemagen on %date%
 */</sgen:header>

<!-- the template for the file footer (note @footer@ is an Ant-ism, and will not be processed by SchemaGen) -->
<sgen:footer rdf:datatype="&xsd;string">
/*
@footer@
*/
</sgen:footer>

<!-- template for extra declarations at the top of the class file -->
<sgen:declarations rdf:datatype="&xsd;string">
    /** Factory for generating symbols */
    private static KsValueFactory s_vf = new DefaultValueFactory();
</sgen:declarations>

<!-- template for introducing the properties in the vocabulary -->
<sgen:propSection rdf:datatype="&xsd;string">
    // Vocabulary properties
    ///////////////////////////
</sgen:propSection>

<!-- template for introducing the classes in the vocabulary -->
<sgen:classSection rdf:datatype="&xsd;string">
    // Vocabulary classes
    ///////////////////////////
</sgen:classSection>

<!-- template for introducing the datatypes in the vocabulary -->
<sgen:datatypeSection rdf:datatype="&xsd;string">
    // Vocabulary datatypes
    ///////////////////////////
</sgen:datatypeSection>

<!-- template for introducing the individuals in the vocabulary -->
<sgen:individualsSection rdf:datatype="&xsd;string">
    // Vocabulary individuals
    ///////////////////////////
</sgen:individualsSection>

<!-- template for doing fancy declarations of individuals -->
<sgen:individualTemplate rdf:datatype="&xsd;string">public static final KsSymbol %valname% = s_vf.newSymbol( "%valuri%" );

    /** Ontology individual corresponding to {@link #%valname%} */
    public static final %valclass% _%valname% = m_model.%valcreator%( %valtype%, "%valuri%" );
</sgen:individualTemplate>

</sgen:Config>

</rdf:RDF>