Package org.carrot2.language
Lexical component interfaces and implementations.
-
Interface Summary Interface Description LabelFilter A cluster label candidate filter.LabelFilterDictionary LanguageComponentsProvider SingleLanguageComponentsProviderImpl.SupplierLoader<T> Stemmer Simple lemmatization engine transforming an inflected form of a word to its base form or some other unique token.StopwordFilter A stop word filter.StopwordFilterDictionary A parameter supplying aStopwordFilter.Tokenizer Splits input characters into tokens representing e.g. -
Class Summary Class Description ConvertLegacyResources Converts legacy*.utf8plain-text resources into corresponding JSON dictionaries.DefaultDictionaryImpl Default implementation ofStopwordFilterDictionaryandLabelFilterDictionaryinterfaces.DefaultLabelFormatterProvider DefaultLexicalDataProvider DefaultStemmersProvider DefaultTokenizersProvider EphemeralDictionaries Ephemeral per-request overrides for the defaultLanguageComponentspassed to the algorithm.ExtendedWhitespaceTokenizer A tokenizer separating input characters on whitespace, but capable of extracting more complex tokens, such as URLs, e-mail addresses and sentence delimiters.GlobDictionary This dictionary implementation is a middle ground between the complexity of regular expressions and sheer speed of plain text matching.GlobDictionary.PatternParser GlobDictionary.Token GlobDictionary.WordPattern LanguageComponents A set of language-specific components.LanguageComponentsLoader LoadedLanguages SingleLanguageComponentsProviderImpl SnowballStemmerAdapter An adapter converting Snowball programs intoStemmerinterface.TokenTypeUtils Utility methods for working withTokenizerattributes. -
Enum Summary Enum Description GlobDictionary.MatchType