Package org.carrot2.text.preprocessing
Class PhraseExtractor
- java.lang.Object
-
- org.carrot2.text.preprocessing.PhraseExtractor
-
public class PhraseExtractor extends Object
Extracts frequent phrases from the provided document. A frequent phrase is a sequence of words that appears in the documents more than once. This phrase extractor aggregates different inflection variants of phrase words into one phrase, returning the most frequent variant. For example, if phrase computing science appears 2 times and computer sciences appears 4 times, the latter will be returned with aggregated frequency of 6.This class saves the following results to the
PreprocessingContext
:PreprocessingContext.AllPhrases.wordIndices
PreprocessingContext.AllPhrases.tf
PreprocessingContext.AllPhrases.tfByDocument
PreprocessingContext.AllTokens.suffixOrder
PreprocessingContext.AllTokens.lcp
This class requires that
Tokenizer
,CaseNormalizer
andLanguageModelStemmer
be invoked first.
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
extractPhrases(PreprocessingContext context)
Performs phrase extraction and saves the results to the providedcontext
.
-
-
-
Method Detail
-
extractPhrases
public void extractPhrases(PreprocessingContext context)
Performs phrase extraction and saves the results to the providedcontext
.
-
-