Package org.carrot2.language
Lexical component interfaces and implementations.
-
Interface Summary Interface Description LabelFilter A cluster label candidate filter.LabelFilterDictionary LanguageComponentsProvider SingleLanguageComponentsProviderImpl.SupplierLoader<T> Stemmer Simple lemmatization engine transforming an inflected form of a word to its base form or some other unique token.StopwordFilter A stop word filter.StopwordFilterDictionary A parameter supplying aStopwordFilter
.Tokenizer Splits input characters into tokens representing e.g. -
Class Summary Class Description ConvertLegacyResources Converts legacy*.utf8
plain-text resources into corresponding JSON dictionaries.DefaultDictionaryImpl Default implementation ofStopwordFilterDictionary
andLabelFilterDictionary
interfaces.DefaultLabelFormatterProvider DefaultLexicalDataProvider DefaultStemmersProvider DefaultTokenizersProvider EphemeralDictionaries Ephemeral per-request overrides for the defaultLanguageComponents
passed to the algorithm.ExtendedWhitespaceTokenizer A tokenizer separating input characters on whitespace, but capable of extracting more complex tokens, such as URLs, e-mail addresses and sentence delimiters.GlobDictionary This dictionary implementation is a middle ground between the complexity of regular expressions and sheer speed of plain text matching.GlobDictionary.PatternParser GlobDictionary.Token GlobDictionary.WordPattern LanguageComponents A set of language-specific components.LanguageComponentsLoader LoadedLanguages SingleLanguageComponentsProviderImpl SnowballStemmerAdapter An adapter converting Snowball programs intoStemmer
interface.TokenTypeUtils Utility methods for working withTokenizer
attributes. -
Enum Summary Enum Description GlobDictionary.MatchType