Package org.carrot2.text.preprocessing
Text preprocessing components.
-
Interface Summary Interface Description ContextPreprocessor LabelFormatter Formats cluster labels for final rendering. -
Class Summary Class Description BasicPreprocessingPipeline Performs basic preprocessing steps on the provided documents.CompletePreprocessingPipeline Performs a complete preprocessing on the provided documents.DocumentAssigner Assigns document to label candidates.LabelFilterProcessor Applies basic filtering to words and phrases to produce candidates for cluster labels.LabelFormatterImpl PhraseExtractor Extracts frequent phrases from the provided document.PreprocessedDocumentScanner Iterates over tokenized documents inPreprocessingContext
.PreprocessingContext Document preprocessing context provides low-level (usually integer-coded) data structures useful for further processing.PreprocessingContext.AllFields Information about all fields processed for the input documents.SparseArray Sparse array encoding utilities.