Package org.carrot2.language
Class GlobDictionary
- java.lang.Object
-
- org.carrot2.language.GlobDictionary
-
- All Implemented Interfaces:
Predicate<CharSequence>
public class GlobDictionary extends Object implements Predicate<CharSequence>
This dictionary implementation is a middle ground between the complexity of regular expressions and sheer speed of plain text matching. It offers case sensitive and case insensitive matching, as well as globs (wildcards matching any token sequence).The following wildcards are available:
*- matches zero or more tokens (possessive match),*?- matches zero or more tokens (reluctant match),+- matches one or more tokens (possessive match),+?- matches zero or more tokens (reluctant match),?- matches exactly one token (possessive).
In addition, a token type matching is provide in the form of:
{name}- matches a token with flags namedname.
Token flags are an int bitfield.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classGlobDictionary.MatchTypestatic classGlobDictionary.PatternParserstatic classGlobDictionary.Tokenstatic classGlobDictionary.WordPattern
-
Constructor Summary
Constructors Constructor Description GlobDictionary(Stream<GlobDictionary.WordPattern> patterns)GlobDictionary(Stream<GlobDictionary.WordPattern> patterns, Function<String,String> tokenNormalization, Function<CharSequence,String[]> termSplitter)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static GlobDictionarycompilePatterns(Stream<String> entries)static Function<CharSequence,String[]>defaultTermSplitter()static Function<String,String>defaultTokenNormalization()booleanfind(String[] inputTerms, String[] normalizedTerms, int[] types, Predicate<GlobDictionary.WordPattern> earlyAbort)Find all matching patterns, optionally aborting prematurely.String[]normalize(String[] tokens)String[]split(CharSequence input)booleantest(CharSequence input)StringtoString()
-
-
-
Constructor Detail
-
GlobDictionary
public GlobDictionary(Stream<GlobDictionary.WordPattern> patterns, Function<String,String> tokenNormalization, Function<CharSequence,String[]> termSplitter)
-
GlobDictionary
public GlobDictionary(Stream<GlobDictionary.WordPattern> patterns)
-
-
Method Detail
-
defaultTermSplitter
public static Function<CharSequence,String[]> defaultTermSplitter()
-
test
public boolean test(CharSequence input)
- Specified by:
testin interfacePredicate<CharSequence>
-
find
public boolean find(String[] inputTerms, String[] normalizedTerms, int[] types, Predicate<GlobDictionary.WordPattern> earlyAbort)
Find all matching patterns, optionally aborting prematurely.- Parameters:
inputTerms- Input terms (verbatim).normalizedTerms- Normalized terms (must use the same normalizer as the dictionary).types- Token types (bitfield) used inGlobDictionary.MatchType.ANY_OF_TYPE.earlyAbort- A predicate that indicates early abort condition.- Returns:
- Returns
trueif at least one match was found,falseotherwise.
-
split
public String[] split(CharSequence input)
-
compilePatterns
public static GlobDictionary compilePatterns(Stream<String> entries)
-
-