Class GenericTokenizerAssembler
- All Implemented Interfaces:
org.apache.jena.assembler.Assembler
The parameters may be of the following types:
text:TypeString String
text:TypeSet org.apache.lucene.analysis.util.CharArraySet
text:TypeFile java.io.FileReader
text:TypeInt int
text:TypeBoolean boolean
text:TypeAnalyzer org.apache.lucene.analysis.Analyzer
Although the list of types is not exhaustive it is a simple matter
to create a wrapper Analyzer that reads a file with information that can
be used to initialize any sort of parameters that may be needed for
a given Analyzer. The provided types cover the vast majority of cases.
For example, org.apache.lucene.analysis.ja.JapaneseAnalyzer
has a constructor with 4 parameters: a UserDict,
a CharArraySet, a JapaneseTokenizer.Mode, and a
Set<String>. So a simple wrapper can extract the values
needed for the various parameters with types not available in this
extension, construct the required instances, and instantiate the
JapaneseAnalyzer.
Adding custom Analyzers such as the above wrapper analyzer is a simple matter of adding the Analyzer class and any associated filters and tokenizer and so on to the classpath for Jena - usually in a jar. Of course, all of the Analyzers that are included in the Lucene distribution bundled with Jena are available as generic Analyzers as well.
Each parameter object is specified with:
- an optional
text:paramNamethat may be used to document which parameter is represented - a
text:paramTypewhich is one of:text:TypeString,text:TypeSet,text:TypeFile,text:TypeInt,text:TypeBoolean,text:TypeAnalyzer. - a text:paramValue which is an xsd:string, xsd:boolean or xsd:int or resource.
A parameter of type text:TypeSet must have a list of zero or
more Strings.
A parameter of type text:TypeString, text:TypeFile,
text:TypeBoolean, text:TypeInt or text:TypeAnalyzer
must have a single text:paramValue of the appropriate type.
Examples:
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:Lucene> ;
text:entityMap <#entMap> ;
text:defineAnalyzers (
[text:addLang "sa-x-iast" ;
text:analyzer [ . . . ]]
[text:defineAnalyzer <#foo> ;
text:analyzer [ . . . ]]
[text:defineTokenizer <#bar> ;
text:tokenizer [
a text:GenericTokenizer ;
text:class "org.apache.lucene.analysis.ngram.NGramTokenizer" ;
text:params (
[ text:paramName "minGram" ;
text:paramType text:TypeInt ;
text:paramValue 3 ]
[ text:paramName "maxGram" ;
text:paramType text:TypeInt ;
text:paramValue 7 ]
)
]
]
)
-
Nested Class Summary
Nested Classes -
Field Summary
Fields inherited from interface org.apache.jena.assembler.Assembler
content, defaultModel, documentManager, general, infModel, memoryModel, ontModel, ontModelSpec, prefixMapping, reasonerFactory, ruleSet, unionModel -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionopen(org.apache.jena.assembler.Assembler a, org.apache.jena.rdf.model.Resource root, org.apache.jena.assembler.Mode mode) Methods inherited from class org.apache.jena.assembler.assemblers.AssemblerBase
getOptionalClassName, getRequiredResource, open, open, openModel, openModel
-
Constructor Details
-
GenericTokenizerAssembler
public GenericTokenizerAssembler()
-
-
Method Details
-
open
public GenericTokenizerAssembler.TokenizerSpec open(org.apache.jena.assembler.Assembler a, org.apache.jena.rdf.model.Resource root, org.apache.jena.assembler.Mode mode) - Specified by:
openin interfaceorg.apache.jena.assembler.Assembler- Specified by:
openin classorg.apache.jena.assembler.assemblers.AssemblerBase
-