Package org.carrot2.text.preprocessing
Class PreprocessingContext
- java.lang.Object
-
- org.carrot2.text.preprocessing.PreprocessingContext
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public final class PreprocessingContext extends Object implements Closeable
Document preprocessing context provides low-level (usually integer-coded) data structures useful for further processing.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classPreprocessingContext.AllFieldsInformation about all fields processed for the input documents.classPreprocessingContext.AllLabelsInformation about words and phrases that might be good cluster label candidates.classPreprocessingContext.AllPhrasesInformation about all frequently appearing sequences of words found in the input documents.classPreprocessingContext.AllStemsInformation about all unique stems found in the input documents.classPreprocessingContext.AllTokensInformation about all tokens of the input documents.classPreprocessingContext.AllWordsInformation about all unique words found in the input documents.
-
Field Summary
Fields Modifier and Type Field Description PreprocessingContext.AllFieldsallFieldsInformation about all fields processed for the input documents.PreprocessingContext.AllLabelsallLabelsInformation about words and phrases that might be good cluster label candidates.PreprocessingContext.AllPhrasesallPhrasesInformation about all frequently appearing sequences of words found in the input documents.PreprocessingContext.AllStemsallStemsInformation about all unique stems found in the input documents.PreprocessingContext.AllTokensallTokensInformation about all tokens of the input documents.PreprocessingContext.AllWordsallWordsInformation about all unique words found in the input documents.intdocumentCountCount of documents processed by the tokenizer.LanguageComponentslanguageComponentsLanguage model to be used
-
Constructor Summary
Constructors Constructor Description PreprocessingContext(LanguageComponents languageComponents)Creates a preprocessing context for the provideddocumentsand with the providedlanguageModel.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidclose()This method should be invoked after all preprocessing contributors have been executed to release temporary data structures.Stringformat(LabelFormatter formatter, int featureIndex)Applies label formatter to a given word or phrase (depending on the feature index provided).booleanhasLabels()Returnstrueif this context contains any label candidates.booleanhasWords()Returnstrueif this context contains any words.char[]intern(MutableCharArray chs)Return a unique char buffer representing a given character sequence.static int[]toFieldIndexes(byte b)Convert the selected bits in a byte to an array of indexes.StringtoString()
-
-
-
Field Detail
-
languageComponents
public final LanguageComponents languageComponents
Language model to be used
-
documentCount
public int documentCount
Count of documents processed by the tokenizer.
-
allTokens
public final PreprocessingContext.AllTokens allTokens
Information about all tokens of the input documents.
-
allFields
public final PreprocessingContext.AllFields allFields
Information about all fields processed for the input documents.
-
allWords
public final PreprocessingContext.AllWords allWords
Information about all unique words found in the input documents.
-
allStems
public final PreprocessingContext.AllStems allStems
Information about all unique stems found in the input documents.
-
allPhrases
public PreprocessingContext.AllPhrases allPhrases
Information about all frequently appearing sequences of words found in the input documents.
-
allLabels
public final PreprocessingContext.AllLabels allLabels
Information about words and phrases that might be good cluster label candidates.
-
-
Constructor Detail
-
PreprocessingContext
public PreprocessingContext(LanguageComponents languageComponents)
Creates a preprocessing context for the provideddocumentsand with the providedlanguageModel.
-
-
Method Detail
-
hasWords
public boolean hasWords()
Returnstrueif this context contains any words.
-
hasLabels
public boolean hasLabels()
Returnstrueif this context contains any label candidates.
-
format
public String format(LabelFormatter formatter, int featureIndex)
Applies label formatter to a given word or phrase (depending on the feature index provided).
-
toFieldIndexes
public static int[] toFieldIndexes(byte b)
Convert the selected bits in a byte to an array of indexes.
-
close
public void close()
This method should be invoked after all preprocessing contributors have been executed to release temporary data structures.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable
-
intern
public char[] intern(MutableCharArray chs)
Return a unique char buffer representing a given character sequence.
-
-