Package org.carrot2.text.preprocessing
Class PhraseExtractor
java.lang.Object
org.carrot2.text.preprocessing.PhraseExtractor
public class PhraseExtractor extends Object
Extracts frequent phrases from the provided document. A frequent phrase is a sequence of words
that appears in the documents more than once. This phrase extractor aggregates different
inflection variants of phrase words into one phrase, returning the most frequent variant. For
example, if phrase computing science appears 2 times and computer sciences appears
4 times, the latter will be returned with aggregated frequency of 6.
This class saves the following results to the PreprocessingContext
:
PreprocessingContext.AllPhrases.wordIndices
PreprocessingContext.AllPhrases.tf
PreprocessingContext.AllPhrases.tfByDocument
PreprocessingContext.AllTokens.suffixOrder
PreprocessingContext.AllTokens.lcp
This class requires that Tokenizer
, CaseNormalizer
and
LanguageModelStemmer
be invoked first.
-
Method Summary
Modifier and Type Method Description void
extractPhrases(PreprocessingContext context)
Performs phrase extraction and saves the results to the providedcontext
.
-
Method Details
-
extractPhrases
Performs phrase extraction and saves the results to the providedcontext
.
-