org.carrot2.text.preprocessing.PhraseExtractor

public class PhraseExtractor
extends Object

Extracts frequent phrases from the provided document. A frequent phrase is a sequence of words that appears in the documents more than once. This phrase extractor aggregates different inflection variants of phrase words into one phrase, returning the most frequent variant. For example, if phrase computing science appears 2 times and computer sciences appears 4 times, the latter will be returned with aggregated frequency of 6.

This class saves the following results to the PreprocessingContext:

This class requires that Tokenizer, CaseNormalizer and LanguageModelStemmer be invoked first.

Method Summary

Modifier and Type Method Description

void extractPhrases(PreprocessingContext context)
Performs phrase extraction and saves the results to the provided context.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- extractPhrases
  
  public void extractPhrases(PreprocessingContext context)
  
  Performs phrase extraction and saves the results to the provided context.

Class PhraseExtractor

Method Summary

Methods inherited from class java.lang.Object

Method Details

extractPhrases