java.lang.Object
- org.carrot2.attrs.AttrComposite
- - org.carrot2.text.vsm.TermDocumentMatrixBuilder

All Implemented Interfaces:

AcceptingVisitor
```
public class TermDocumentMatrixBuilder
extends AttrComposite
```
Builds a term document matrix based on the provided PreprocessingContext.

Field Summary

Fields
Modifier and Type	Field	Description
`AttrDouble`	`boostedFieldWeight`	The extra weight to apply to words that appeared in boosted fields.
`AttrStringArray`	`boostFields`	A list fields for which to apply extra weight.
`AttrInteger`	`maximumMatrixSize`	Maximum number of elements the term-document matrix can have.
`AttrDouble`	`maxWordDf`	Maximum document frequency allowed for words as a fraction of all documents.
`TermWeighting`	`termWeighting`	Method for calculating weights of words in the term-document matrices.

Fields inherited from class org.carrot2.attrs.AttrComposite
attributes

Constructor Summary

Constructors
Constructor Description

TermDocumentMatrixBuilder()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`buildTermDocumentMatrix(VectorSpaceModelContext vsmContext)`	Builds a term-document matrix from data provided in the `context`, stores the result in there.
`void`	`buildTermPhraseMatrix(VectorSpaceModelContext context)`	Builds a term-phrase matrix in the same space as the main term-document matrix.

Methods inherited from class org.carrot2.attrs.AttrComposite
accept

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - boostedFieldWeight
```
public final AttrDouble boostedFieldWeight
```
    The extra weight to apply to words that appeared in boosted fields. The larger the value, the stronger the boost.
  - boostFields
```
public AttrStringArray boostFields
```
    A list fields for which to apply extra weight. Content of fields provided in this parameter can be given more weight during clustering. You may want to boost, for example, the title field with the assumption that it accurately summarizes the content of the whole document.
  - maximumMatrixSize
```
public final AttrInteger maximumMatrixSize
```
    Maximum number of elements the term-document matrix can have. The larger the allowed matrix size, the more accurate, time- and memory-consuming clustering.
  - maxWordDf
```
public final AttrDouble maxWordDf
```
    Maximum document frequency allowed for words as a fraction of all documents. Words with document frequency larger than maxWordDf will be ignored. For example, when maxWordDf is 0.4, words appearing in more than 40% of documents will be be ignored. A value of 1.0 means that all words will be taken into account, no matter in how many documents they appear.
    This parameter may be useful when certain words appear in most of the input documents (e.g. company name from header or footer) and such words dominate the cluster labels. In such case, setting it to a value lower than 1.0 (e.g. 0.9) may improve the clusters.
    Another useful application of this parameter is when there is a need to generate only very specific clusters, that is clusters containing small numbers of documents. This can be achieved by setting maxWordDf to extremely low values: 0.1 or 0.05.
  - termWeighting
```
public TermWeighting termWeighting
```
    Method for calculating weights of words in the term-document matrices.
- Constructor Detail
  - TermDocumentMatrixBuilder
```
public TermDocumentMatrixBuilder()
```
- Method Detail
  - buildTermDocumentMatrix
```
public void buildTermDocumentMatrix(VectorSpaceModelContext vsmContext)
```
    Builds a term-document matrix from data provided in the context, stores the result in there.
  - buildTermPhraseMatrix
```
public void buildTermPhraseMatrix(VectorSpaceModelContext context)
```
    Builds a term-phrase matrix in the same space as the main term-document matrix. If the processing context contains no phrases, VectorSpaceModelContext.termPhraseMatrix will remain null.

Class TermDocumentMatrixBuilder

Field Summary

Fields inherited from class org.carrot2.attrs.AttrComposite

Constructor Summary

Method Summary

Methods inherited from class org.carrot2.attrs.AttrComposite

Methods inherited from class java.lang.Object

Field Detail

boostedFieldWeight

boostFields

maximumMatrixSize

maxWordDf

termWeighting

Constructor Detail

TermDocumentMatrixBuilder

Method Detail

buildTermDocumentMatrix

buildTermPhraseMatrix