Package org.carrot2.text.preprocessing
Class PreprocessingContext
java.lang.Object
org.carrot2.text.preprocessing.PreprocessingContext
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class PreprocessingContext extends Object implements Closeable
Document preprocessing context provides low-level (usually integer-coded) data structures useful
for further processing.
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PreprocessingContext.AllFields
Information about all fields processed for the input documents.class
PreprocessingContext.AllLabels
Information about words and phrases that might be good cluster label candidates.class
PreprocessingContext.AllPhrases
Information about all frequently appearing sequences of words found in the input documents.class
PreprocessingContext.AllStems
Information about all unique stems found in the input documents.class
PreprocessingContext.AllTokens
Information about all tokens of the input documents.class
PreprocessingContext.AllWords
Information about all unique words found in the input documents. -
Field Summary
Fields Modifier and Type Field Description PreprocessingContext.AllFields
allFields
Information about all fields processed for the input documents.PreprocessingContext.AllLabels
allLabels
Information about words and phrases that might be good cluster label candidates.PreprocessingContext.AllPhrases
allPhrases
Information about all frequently appearing sequences of words found in the input documents.PreprocessingContext.AllStems
allStems
Information about all unique stems found in the input documents.PreprocessingContext.AllTokens
allTokens
Information about all tokens of the input documents.PreprocessingContext.AllWords
allWords
Information about all unique words found in the input documents.int
documentCount
Count of documents processed by the tokenizer.LanguageComponents
languageComponents
Language model to be used -
Constructor Summary
Constructors Constructor Description PreprocessingContext(LanguageComponents languageComponents)
Creates a preprocessing context for the provideddocuments
and with the providedlanguageModel
. -
Method Summary
Modifier and Type Method Description void
close()
This method should be invoked after all preprocessing contributors have been executed to release temporary data structures.String
format(LabelFormatter formatter, int featureIndex)
Applies label formatter to a given word or phrase (depending on the feature index provided).boolean
hasLabels()
Returnstrue
if this context contains any label candidates.boolean
hasWords()
Returnstrue
if this context contains any words.char[]
intern(MutableCharArray chs)
Return a unique char buffer representing a given character sequence.static int[]
toFieldIndexes(byte b)
Convert the selected bits in a byte to an array of indexes.String
toString()
-
Field Details
-
languageComponents
Language model to be used -
documentCount
public int documentCountCount of documents processed by the tokenizer. -
allTokens
Information about all tokens of the input documents. -
allFields
Information about all fields processed for the input documents. -
allWords
Information about all unique words found in the input documents. -
allStems
Information about all unique stems found in the input documents. -
allPhrases
Information about all frequently appearing sequences of words found in the input documents. -
allLabels
Information about words and phrases that might be good cluster label candidates.
-
-
Constructor Details
-
PreprocessingContext
Creates a preprocessing context for the provideddocuments
and with the providedlanguageModel
.
-
-
Method Details
-
hasWords
public boolean hasWords()Returnstrue
if this context contains any words. -
hasLabels
public boolean hasLabels()Returnstrue
if this context contains any label candidates. -
format
Applies label formatter to a given word or phrase (depending on the feature index provided). -
toString
-
toFieldIndexes
public static int[] toFieldIndexes(byte b)Convert the selected bits in a byte to an array of indexes. -
close
public void close()This method should be invoked after all preprocessing contributors have been executed to release temporary data structures.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
-
intern
Return a unique char buffer representing a given character sequence.
-