org.carrot2.text.preprocessing.BasicPreprocessingPipeline

public class BasicPreprocessingPipeline
extends AttrComposite
implements ContextPreprocessor

Performs basic preprocessing steps on the provided documents. The preprocessing consists of the following steps:

Field Summary

Fields
Modifier and Type	Field	Description
`protected org.carrot2.text.preprocessing.CaseNormalizer`	`caseNormalizer`	Case normalizer used by the algorithm.
`protected org.carrot2.text.preprocessing.LanguageModelStemmer`	`stemming`	Stemmer used by the algorithm.
`protected org.carrot2.text.preprocessing.StopListMarker`	`stopListMarker`	Stop list marker used by the algorithm, contains bindable attributes.
`protected org.carrot2.text.preprocessing.InputTokenizer`	`tokenizer`	Tokenizer used by the algorithm.
`AttrInteger`	`wordDfThreshold`	Word Document Frequency threshold.

attributes

Method Summary

Modifier and Type	Method	Description
`PreprocessingContext`	`preprocess(Stream<? extends Document> documents, String query, LanguageComponents langModel)`	Performs preprocessing on the provided list of documents.

accept

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- wordDfThreshold
  
  public final AttrInteger wordDfThreshold
  
  Word Document Frequency threshold. Words appearing in fewer than dfThreshold documents will be ignored.
- caseNormalizer
  
  protected final org.carrot2.text.preprocessing.CaseNormalizer caseNormalizer
  
  Case normalizer used by the algorithm.
- stemming
  
  protected final org.carrot2.text.preprocessing.LanguageModelStemmer stemming
  
  Stemmer used by the algorithm.
- stopListMarker
  
  protected final org.carrot2.text.preprocessing.StopListMarker stopListMarker
  
  Stop list marker used by the algorithm, contains bindable attributes.
- tokenizer
  
  protected final org.carrot2.text.preprocessing.InputTokenizer tokenizer
  
  Tokenizer used by the algorithm.
Constructor Details
- BasicPreprocessingPipeline
  
  public BasicPreprocessingPipeline()
Method Details
- preprocess
  
  public PreprocessingContext preprocess(Stream<? extends Document> documents, String query, LanguageComponents langModel)
  
  Performs preprocessing on the provided list of documents. Results can be obtained from the returned PreprocessingContext.
  
  Specified by:
  
  preprocess in interface ContextPreprocessor