Package org.carrot2.text.preprocessing
Class PhraseExtractor
- java.lang.Object
-
- org.carrot2.text.preprocessing.PhraseExtractor
-
public class PhraseExtractor extends Object
Extracts frequent phrases from the provided document. A frequent phrase is a sequence of words that appears in the documents more than once. This phrase extractor aggregates different inflection variants of phrase words into one phrase, returning the most frequent variant. For example, if phrase computing science appears 2 times and computer sciences appears 4 times, the latter will be returned with aggregated frequency of 6.This class saves the following results to the
PreprocessingContext:PreprocessingContext.AllPhrases.wordIndicesPreprocessingContext.AllPhrases.tfPreprocessingContext.AllPhrases.tfByDocumentPreprocessingContext.AllTokens.suffixOrderPreprocessingContext.AllTokens.lcp
This class requires that
Tokenizer,CaseNormalizerandLanguageModelStemmerbe invoked first.
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidextractPhrases(PreprocessingContext context)Performs phrase extraction and saves the results to the providedcontext.
-
-
-
Method Detail
-
extractPhrases
public void extractPhrases(PreprocessingContext context)
Performs phrase extraction and saves the results to the providedcontext.
-
-