GenericAnalyzer
,
GenericFilter
, and GenericTokenizer
.
The parameters may be of the following types:
text:TypeString String text:TypeSet org.apache.lucene.analysis.util.CharArraySet text:TypeFile java.io.FileReader text:TypeInt int text:TypeBoolean boolean text:TypeAnalyzer org.apache.lucene.analysis.Analyzer text:TypeTokenStream org.apache.lucene.analysis.TokenStreamAlthough the list of types is not exhaustive it is a simple matter to create a wrapper Analyzer, Filter, Tokenizer that reads a file with information that can be used to initialize any sort of parameters that may be needed. The provided types cover the vast majority of cases.
For example, org.apache.lucene.analysis.ja.JapaneseAnalyzer
has a constructor with 4 parameters: a UserDict
,
a CharArraySet
, a JapaneseTokenizer.Mode
, and a
Set<String>
. So a simple wrapper can extract the values
needed for the various parameters with types not available in this
extension, construct the required instances, and instantiate the
JapaneseAnalyzer
.
Adding custom Analyzers, etc., such as the above wrapper analyzer is a simple matter of adding the Analyzer class and any associated filters and tokenizer and so on to the classpath for Jena - usually in a jar. Of course, all of the Analyzers, Filters, and Tokenizers that are included in the Lucene distribution bundled with Jena are available as generics as well.
Each parameter object is specified with:
- an optional
text:paramName
that may be used to document which parameter is represented - a
text:paramType
which is one of:text:TypeString
,text:TypeSet
,text:TypeFile
,text:TypeInt
,text:TypeBoolean
,text:TypeAnalyzer
. - a text:paramValue which is an xsd:string, xsd:boolean or xsd:int or resource.
A parameter of type text:TypeSet
must have a list of zero or
more String
s.
A parameter of type text:TypeString
, text:TypeFile
,
text:TypeBoolean
, text:TypeInt
or text:TypeAnalyzer
must have a single text:paramValue
of the appropriate type.
A parameter of type text:TypeTokenStream
does not have text:paramValue
.
It is used to mark the occurence of the TokenStream
parameter for a Filter
.
Examples:
text:map ( [ text:field "text" ; text:predicate rdfs:label; text:analyzer [ a text:GenericAnalyzer ; text:class "org.apache.lucene.analysis.en.EnglishAnalyzer" ; text:params ( [ text:paramName "stopwords" ; text:paramType text:TypeSet ; text:paramValue ("the" "a" "an") ] [ text:paramName "stemExclusionSet" ; text:paramType text:TypeSet ; text:paramValue ("ing" "ed") ] ) ] .
[] a text:TextIndexLucene ; text:defineFilters ( text:filter [ a text:GenericFilter ; text:class "fi.finto.FoldingFilter" ; text:params ( [ text:paramName "source" ; text:paramType text:TypeTokenStream ] [ text:paramName "whitelisted" ; text:paramType text:TypeSet ; text:paramValue ("รง") ] ) ] )
-
Field Details
-
TYPE_ANALYZER
- See Also:
-
TYPE_BOOL
- See Also:
-
TYPE_FILE
- See Also:
-
TYPE_INT
- See Also:
-
TYPE_SET
- See Also:
-
TYPE_STRING
- See Also:
-
TYPE_TOKENSTREAM
- See Also:
-
-
Constructor Details
-
Params
public Params()
-