java.lang.Object
- org.carrot2.text.preprocessing.PreprocessingContext.AllWords

Enclosing class:

PreprocessingContext
```
public class PreprocessingContext.AllWords
extends Object
```
Information about all unique words found in the input documents. An entry in each parallel array corresponds to one conflated form of a word. For example, data and DATA will most likely become a single entry in the words table. However, different grammatical forms of a single lemma (like computer and computers) will have different entries in the words table. See PreprocessingContext.AllStems for inflection-conflated versions.
All arrays in this class have the same length and values across different arrays correspond to each other for the same index.

Field Summary

Fields
Modifier and Type	Field	Description
`byte[]`	`fieldIndices`	A bit-packed index of all fields in which this word appears at least once.
`char[][]`	`image`	The most frequently appearing variant of the word with respect to case.
`int[]`	`stemIndex`	A pointer to the `PreprocessingContext.AllStems` arrays for this word.
`int[]`	`tf`	Term Frequency of the word, aggregated across all variants with respect to case.
`int[][]`	`tfByDocument`	Term Frequency of the word for each document.
`short[]`	`type`	Token type of this word copied from `PreprocessingContext.AllTokens.type`.

Constructor Summary

Constructors
Constructor Description

AllWords()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type Method Description

String toString()
For debugging purposes.
- Methods inherited from class java.lang.Object
  clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - image
```
public char[][] image
```
    The most frequently appearing variant of the word with respect to case. E.g. if a token MacOS appeared 12 times in the input and macos appeared 3 times, the image will be equal to MacOS.
    This array is produced by CaseNormalizer.
  - type
```
public short[] type
```
    Token type of this word copied from PreprocessingContext.AllTokens.type. Additional flags are set for each word by CaseNormalizer and LanguageModelStemmer.
    This array is produced by CaseNormalizer. This array is modified by LanguageModelStemmer.
    
    See Also:
    
    Tokenizer
  - tf
```
public int[] tf
```
    Term Frequency of the word, aggregated across all variants with respect to case. Frequencies for each variant separately are not available.
    This array is produced by CaseNormalizer.
  - tfByDocument
```
public int[][] tfByDocument
```
    Term Frequency of the word for each document. The length of this array is equal to the number of documents this word appeared in (Document Frequency) multiplied by 2. Elements at even indices contain document indices pointing to documents, elements at odd indices contain the frequency of the word in the document. For example, an array with 4 values: [2, 15, 138, 7] means that the word appeared 15 times in document at index 2 and 7 times in document at index 138.
    This array is produced by CaseNormalizer. The order of documents in this array is not defined.
  - stemIndex
```
public int[] stemIndex
```
    A pointer to the PreprocessingContext.AllStems arrays for this word.
    This array is produced by LanguageModelStemmer.
  - fieldIndices
```
public byte[] fieldIndices
```
    A bit-packed index of all fields in which this word appears at least once. Indexes (positions) of selected bits are pointers to the PreprocessingContext.AllFields arrays. Fast conversion between the bit-packed representation and byte[] with index values is done by PreprocessingContext.toFieldIndexes(byte)
    This array is produced by CaseNormalizer.
- Constructor Detail
  - AllWords
```
public AllWords()
```
- Method Detail
  - toString
```
public String toString()
```
    For debugging purposes.
    
    Overrides:
    
    toString in class Object

Class PreprocessingContext.AllWords

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

image

type

tf

tfByDocument

stemIndex

fieldIndices

Constructor Detail

AllWords

Method Detail

toString