Package org.carrot2.text.preprocessing
Class PreprocessingContext
java.lang.Object
org.carrot2.text.preprocessing.PreprocessingContext
- All Implemented Interfaces:
Closeable,AutoCloseable
public final class PreprocessingContext extends Object implements Closeable
Document preprocessing context provides low-level (usually integer-coded) data structures useful
for further processing.

-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classPreprocessingContext.AllFieldsInformation about all fields processed for the input documents.classPreprocessingContext.AllLabelsInformation about words and phrases that might be good cluster label candidates.classPreprocessingContext.AllPhrasesInformation about all frequently appearing sequences of words found in the input documents.classPreprocessingContext.AllStemsInformation about all unique stems found in the input documents.classPreprocessingContext.AllTokensInformation about all tokens of the input documents.classPreprocessingContext.AllWordsInformation about all unique words found in the input documents. -
Field Summary
Fields Modifier and Type Field Description PreprocessingContext.AllFieldsallFieldsInformation about all fields processed for the input documents.PreprocessingContext.AllLabelsallLabelsInformation about words and phrases that might be good cluster label candidates.PreprocessingContext.AllPhrasesallPhrasesInformation about all frequently appearing sequences of words found in the input documents.PreprocessingContext.AllStemsallStemsInformation about all unique stems found in the input documents.PreprocessingContext.AllTokensallTokensInformation about all tokens of the input documents.PreprocessingContext.AllWordsallWordsInformation about all unique words found in the input documents.intdocumentCountCount of documents processed by the tokenizer.LanguageComponentslanguageComponentsLanguage model to be used -
Constructor Summary
Constructors Constructor Description PreprocessingContext(LanguageComponents languageComponents)Creates a preprocessing context for the provideddocumentsand with the providedlanguageModel. -
Method Summary
Modifier and Type Method Description voidclose()This method should be invoked after all preprocessing contributors have been executed to release temporary data structures.Stringformat(LabelFormatter formatter, int featureIndex)Applies label formatter to a given word or phrase (depending on the feature index provided).booleanhasLabels()Returnstrueif this context contains any label candidates.booleanhasWords()Returnstrueif this context contains any words.char[]intern(MutableCharArray chs)Return a unique char buffer representing a given character sequence.static int[]toFieldIndexes(byte b)Convert the selected bits in a byte to an array of indexes.StringtoString()
-
Field Details
-
languageComponents
Language model to be used -
documentCount
public int documentCountCount of documents processed by the tokenizer. -
allTokens
Information about all tokens of the input documents. -
allFields
Information about all fields processed for the input documents. -
allWords
Information about all unique words found in the input documents. -
allStems
Information about all unique stems found in the input documents. -
allPhrases
Information about all frequently appearing sequences of words found in the input documents. -
allLabels
Information about words and phrases that might be good cluster label candidates.
-
-
Constructor Details
-
PreprocessingContext
Creates a preprocessing context for the provideddocumentsand with the providedlanguageModel.
-
-
Method Details
-
hasWords
public boolean hasWords()Returnstrueif this context contains any words. -
hasLabels
public boolean hasLabels()Returnstrueif this context contains any label candidates. -
format
Applies label formatter to a given word or phrase (depending on the feature index provided). -
toString
-
toFieldIndexes
public static int[] toFieldIndexes(byte b)Convert the selected bits in a byte to an array of indexes. -
close
public void close()This method should be invoked after all preprocessing contributors have been executed to release temporary data structures.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable
-
intern
Return a unique char buffer representing a given character sequence.
-