Package org.carrot2.language
Lexical component interfaces and implementations.
-
Interface Summary Interface Description LanguageComponentsProvider LexicalData Provides additional word and label filtering information for a given language.SingleLanguageComponentsProviderImpl.SupplierLoader<T> Stemmer Simple lemmatization engine transforming an inflected form of a word to its base form or some other unique token.Tokenizer Splits input characters into tokens representing e.g. -
Class Summary Class Description DefaultLabelFormatterProvider DefaultLexicalDataProvider DefaultStemmersProvider DefaultTokenizersProvider ExtendedWhitespaceTokenizer A tokenizer separating input characters on whitespace, but capable of extracting more complex tokens, such as URLs, e-mail addresses and sentence delimiters.LanguageComponents A set of language-specific components.LanguageComponentsLoader LexicalDataImpl LexicalData
implemented on top of a hash set (stopwords) and a regular expression pattern (stoplabels).LoadedLanguages SingleLanguageComponentsProviderImpl SnowballStemmerAdapter An adapter converting Snowball programs intoStemmer
interface.TokenTypeUtils Utility methods for working withTokenizer
attributes.