org.apache.jena.query.text.assembler.Params

public class Params extends Object

Parses assembler parameter definitions for GenericAnalyzer, GenericFilter, and GenericTokenizer.

The parameters may be of the following types:

     text:TypeString        String
     text:TypeSet           org.apache.lucene.analysis.util.CharArraySet
     text:TypeFile          java.io.FileReader
     text:TypeInt           int
     text:TypeBoolean       boolean
     text:TypeAnalyzer      org.apache.lucene.analysis.Analyzer
     text:TypeTokenStream   org.apache.lucene.analysis.TokenStream

Although the list of types is not exhaustive it is a simple matter to create a wrapper Analyzer, Filter, Tokenizer that reads a file with information that can be used to initialize any sort of parameters that may be needed. The provided types cover the vast majority of cases.

For example, org.apache.lucene.analysis.ja.JapaneseAnalyzer has a constructor with 4 parameters: a UserDict, a CharArraySet, a JapaneseTokenizer.Mode, and a Set<String>. So a simple wrapper can extract the values needed for the various parameters with types not available in this extension, construct the required instances, and instantiate the JapaneseAnalyzer.

Adding custom Analyzers, etc., such as the above wrapper analyzer is a simple matter of adding the Analyzer class and any associated filters and tokenizer and so on to the classpath for Jena - usually in a jar. Of course, all of the Analyzers, Filters, and Tokenizers that are included in the Lucene distribution bundled with Jena are available as generics as well.

Each parameter object is specified with:

an optional text:paramName that may be used to document which parameter is represented
a text:paramType which is one of: text:TypeString, text:TypeSet, text:TypeFile, text:TypeInt, text:TypeBoolean, text:TypeAnalyzer.
a text:paramValue which is an xsd:string, xsd:boolean or xsd:int or resource.

A parameter of type text:TypeSet must have a list of zero or more Strings.

A parameter of type text:TypeString, text:TypeFile, text:TypeBoolean, text:TypeInt or text:TypeAnalyzer must have a single text:paramValue of the appropriate type.

A parameter of type text:TypeTokenStream does not have text:paramValue. It is used to mark the occurence of the TokenStream parameter for a Filter.

Examples:

    text:map (
         [ text:field "text" ; 
           text:predicate rdfs:label;
           text:analyzer [
               a text:GenericAnalyzer ;
               text:class "org.apache.lucene.analysis.en.EnglishAnalyzer" ;
               text:params (
                    [ text:paramName "stopwords" ;
                      text:paramType text:TypeSet ;
                      text:paramValue ("the" "a" "an") ]
                    [ text:paramName "stemExclusionSet" ;
                      text:paramType text:TypeSet ;
                      text:paramValue ("ing" "ed") ]
                    )
           ] .

    [] a text:TextIndexLucene ;
       text:defineFilters (
           text:filter [
               a text:GenericFilter ;
               text:class "fi.finto.FoldingFilter" ;
               text:params (
                    [ text:paramName "source" ;
                      text:paramType text:TypeTokenStream ]
                    [ text:paramName "whitelisted" ;
                      text:paramType text:TypeSet ;
                      text:paramValue ("ç") ]
                    )
           ]
        )

Field Summary

Fields

Modifier and Type

Field

Description

static final String

TYPE_ANALYZER

static final String

TYPE_BOOL

static final String

TYPE_FILE

static final String

TYPE_INT

static final String

TYPE_SET

static final String

TYPE_STRING

static final String

TYPE_TOKENSTREAM
Constructor Summary

Constructors

Constructor

Description

Params()
Method Summary

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- TYPE_ANALYZER
  
  public static final String TYPE_ANALYZER
  See Also:
  
  Constant Field Values
- TYPE_BOOL
  
  public static final String TYPE_BOOL
  See Also:
  
  Constant Field Values
- TYPE_FILE
  
  public static final String TYPE_FILE
  See Also:
  
  Constant Field Values
- TYPE_INT
  
  public static final String TYPE_INT
  See Also:
  
  Constant Field Values
- TYPE_SET
  
  public static final String TYPE_SET
  See Also:
  
  Constant Field Values
- TYPE_STRING
  
  public static final String TYPE_STRING
  See Also:
  
  Constant Field Values
- TYPE_TOKENSTREAM
  
  public static final String TYPE_TOKENSTREAM
  See Also:
  
  Constant Field Values
Constructor Details
- Params
  
  public Params()

Class Params

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

TYPE_ANALYZER

TYPE_BOOL

TYPE_FILE

TYPE_INT

TYPE_SET

TYPE_STRING

TYPE_TOKENSTREAM

Constructor Details

Params