Package org.carrot2.language
Class GlobDictionary
- java.lang.Object
-
- org.carrot2.language.GlobDictionary
-
- All Implemented Interfaces:
Predicate<CharSequence>
public class GlobDictionary extends Object implements Predicate<CharSequence>
This dictionary implementation is a middle ground between the complexity of regular expressions and sheer speed of plain text matching. It offers case sensitive and case insensitive matching, as well as globs (wildcards matching any token sequence).The following wildcards are available:
*
- matches zero or more tokens (possessive match),*?
- matches zero or more tokens (reluctant match),+
- matches one or more tokens (possessive match),+?
- matches zero or more tokens (reluctant match),?
- matches exactly one token (possessive).
In addition, a token type matching is provide in the form of:
{name}
- matches a token with flags namedname
.
Token flags are an int bitfield.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
GlobDictionary.MatchType
static class
GlobDictionary.PatternParser
static class
GlobDictionary.Token
static class
GlobDictionary.WordPattern
-
Constructor Summary
Constructors Constructor Description GlobDictionary(Stream<GlobDictionary.WordPattern> patterns)
GlobDictionary(Stream<GlobDictionary.WordPattern> patterns, Function<String,String> tokenNormalization, Function<CharSequence,String[]> termSplitter)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static GlobDictionary
compilePatterns(Stream<String> entries)
static Function<CharSequence,String[]>
defaultTermSplitter()
static Function<String,String>
defaultTokenNormalization()
boolean
find(String[] inputTerms, String[] normalizedTerms, int[] types, Predicate<GlobDictionary.WordPattern> earlyAbort)
Find all matching patterns, optionally aborting prematurely.String[]
normalize(String[] tokens)
String[]
split(CharSequence input)
boolean
test(CharSequence input)
String
toString()
-
-
-
Constructor Detail
-
GlobDictionary
public GlobDictionary(Stream<GlobDictionary.WordPattern> patterns, Function<String,String> tokenNormalization, Function<CharSequence,String[]> termSplitter)
-
GlobDictionary
public GlobDictionary(Stream<GlobDictionary.WordPattern> patterns)
-
-
Method Detail
-
defaultTermSplitter
public static Function<CharSequence,String[]> defaultTermSplitter()
-
test
public boolean test(CharSequence input)
- Specified by:
test
in interfacePredicate<CharSequence>
-
find
public boolean find(String[] inputTerms, String[] normalizedTerms, int[] types, Predicate<GlobDictionary.WordPattern> earlyAbort)
Find all matching patterns, optionally aborting prematurely.- Parameters:
inputTerms
- Input terms (verbatim).normalizedTerms
- Normalized terms (must use the same normalizer as the dictionary).types
- Token types (bitfield) used inGlobDictionary.MatchType.ANY_OF_TYPE
.earlyAbort
- A predicate that indicates early abort condition.- Returns:
- Returns
true
if at least one match was found,false
otherwise.
-
split
public String[] split(CharSequence input)
-
compilePatterns
public static GlobDictionary compilePatterns(Stream<String> entries)
-
-