Class PhraseExtractor

java.lang.Object
org.carrot2.text.preprocessing.PhraseExtractor

public class PhraseExtractor
extends Object
Extracts frequent phrases from the provided document. A frequent phrase is a sequence of words that appears in the documents more than once. This phrase extractor aggregates different inflection variants of phrase words into one phrase, returning the most frequent variant. For example, if phrase computing science appears 2 times and computer sciences appears 4 times, the latter will be returned with aggregated frequency of 6.

This class saves the following results to the PreprocessingContext:

This class requires that Tokenizer, CaseNormalizer and LanguageModelStemmer be invoked first.