Class GenericAnalyzerAssembler
- All Implemented Interfaces:
Assembler
The parameters may be of the following types:
text:TypeString String text:TypeSet org.apache.lucene.analysis.util.CharArraySet text:TypeFile java.io.FileReader text:TypeInt int text:TypeBoolean boolean text:TypeAnalyzer org.apache.lucene.analysis.AnalyzerAlthough the list of types is not exhaustive it is a simple matter to create a wrapper Analyzer that reads a file with information that can be used to initialize any sort of parameters that may be needed for a given Analyzer. The provided types cover the vast majority of cases.
For example, org.apache.lucene.analysis.ja.JapaneseAnalyzer
has a constructor with 4 parameters: a UserDict
,
a CharArraySet
, a JapaneseTokenizer.Mode
, and a
Set<String>
. So a simple wrapper can extract the values
needed for the various parameters with types not available in this
extension, construct the required instances, and instantiate the
JapaneseAnalyzer
.
Adding custom Analyzers such as the above wrapper analyzer is a simple matter of adding the Analyzer class and any associated filters and tokenizer and so on to the classpath for Jena - usually in a jar. Of course, all of the Analyzers that are included in the Lucene distribution bundled with Jena are available as generic Analyzers as well.
Each parameter object is specified with:
- an optional
text:paramName
that may be used to document which parameter is represented - a
text:paramType
which is one of:text:TypeString
,text:TypeSet
,text:TypeFile
,text:TypeInt
,text:TypeBoolean
,text:TypeAnalyzer
. - a text:paramValue which is an xsd:string, xsd:boolean or xsd:int or resource.
A parameter of type text:TypeSet
must have a list of zero or
more String
s.
A parameter of type text:TypeString
, text:TypeFile
,
text:TypeBoolean
, text:TypeInt
or text:TypeAnalyzer
must have a single text:paramValue
of the appropriate type.
Examples:
text:map ( [ text:field "text" ; text:predicate rdfs:label; text:analyzer [ a text:GenericAnalyzer ; text:class "org.apache.lucene.analysis.en.EnglishAnalyzer" ; text:params ( [ text:paramName "stopwords" ; text:paramType text:TypeSet ; text:paramValue ("the" "a" "an") ] [ text:paramName "stemExclusionSet" ; text:paramType text:TypeSet ; text:paramValue ("ing" "ed") ] ) ] .
text:map ( [ text:field "text" ; text:predicate rdfs:label; text:analyzer [ a text:GenericAnalyzer ; text:class "org.apache.lucene.analysis.shingle.ShingleAnalyzerWrapper" ; text:params ( [ text:paramName "defaultAnalyzer" ; text:paramType text:TypeAnalyzer ; text:paramValue [ a text:SimpleAnalyzer ] ] [ text:paramName "maxShingleSize" ; text:paramType text:TypeInt ; text:paramValue 3 ] ) ] .
-
Field Summary
Fields inherited from interface org.apache.jena.assembler.Assembler
content, defaultModel, documentManager, fileManager, general, infModel, locationMapper, memoryModel, modelSource, ontModel, ontModelSpec, prefixMapping, reasonerFactory, ruleSet, unionModel
-
Constructor Summary
Constructors -
Method Summary
Methods inherited from class org.apache.jena.assembler.assemblers.AssemblerBase
getOptionalClassName, getRequiredResource, open, open, openModel, openModel
-
Constructor Details
-
GenericAnalyzerAssembler
public GenericAnalyzerAssembler()
-
-
Method Details
-
open
- Specified by:
open
in interfaceAssembler
- Specified by:
open
in classAssemblerBase
-