Package org.carrot2.text.vsm
Class VectorSpaceModelContext
java.lang.Object
org.carrot2.text.vsm.VectorSpaceModelContext
public class VectorSpaceModelContext extends Object
Stores data related to the Vector Space Model of the processed documents.
-
Field Summary
Fields Modifier and Type Field Description PreprocessingContext
preprocessingContext
Preprocessing context for the underlying documents.com.carrotsearch.hppc.IntIntHashMap
stemToRowIndex
Stem index to row index mapping for thetdMatrix
.org.carrot2.math.mahout.matrix.DoubleMatrix2D
termDocumentMatrix
Term-document matrix.org.carrot2.math.mahout.matrix.DoubleMatrix2D
termPhraseMatrix
Term-document-like matrix for phrases fromPreprocessingContext.AllLabels
. -
Constructor Summary
Constructors Constructor Description VectorSpaceModelContext(PreprocessingContext preprocessingContext)
Creates a vector space model context with the provided preprocessing context. -
Method Summary
-
Field Details
-
preprocessingContext
Preprocessing context for the underlying documents. -
termDocumentMatrix
public org.carrot2.math.mahout.matrix.DoubleMatrix2D termDocumentMatrixTerm-document matrix. Rows of the matrix correspond to word stems, columns correspond to the processed documents. For mapping between rows of this matrix andPreprocessingContext.AllStems
, seestemToRowIndex
.This matrix is produced by
TermDocumentMatrixBuilder.buildTermDocumentMatrix(VectorSpaceModelContext)
. -
termPhraseMatrix
public org.carrot2.math.mahout.matrix.DoubleMatrix2D termPhraseMatrixTerm-document-like matrix for phrases fromPreprocessingContext.AllLabels
. If there are no phrases inPreprocessingContext.AllLabels
, phrase matrix isnull
. For mapping between rows of this matrix andPreprocessingContext.AllStems
, seestemToRowIndex
.This matrix is produced by
TermDocumentMatrixBuilder.buildTermPhraseMatrix(VectorSpaceModelContext)
. -
stemToRowIndex
public com.carrotsearch.hppc.IntIntHashMap stemToRowIndexStem index to row index mapping for thetdMatrix
. Keys in this map are indices of entries inPreprocessingContext.AllStems
arrays, values are the indices oftdMatrix
rows corresponding to the stems. Please note that depending on the limit on the size of the matrix, some stems may not have their corresponding matrix rows.This object is produced by
TermDocumentMatrixBuilder.buildTermDocumentMatrix(VectorSpaceModelContext)
.
-
-
Constructor Details
-
VectorSpaceModelContext
Creates a vector space model context with the provided preprocessing context.
-