Class KMeansMatrixFactorization

  • All Implemented Interfaces:
    IterativeMatrixFactorization, MatrixFactorization

    public class KMeansMatrixFactorization
    extends Object
    Performs matrix factorization using the K-means clustering algorithm. This kind of factorization is sometimes referred to as Concept Decomposition Factorization.
    • Field Detail

      • k

        protected int k
        The desired number of base vectors
      • DEFAULT_K

        protected static int DEFAULT_K
      • maxIterations

        protected int maxIterations
        The maximum number of iterations the algorithm is allowed to run
      • DEFAULT_MAX_ITERATIONS

        protected static final int DEFAULT_MAX_ITERATIONS
        See Also:
        Constant Field Values
      • stopThreshold

        protected double stopThreshold
        If the percentage decrease in approximation error becomes smaller than stopThreshold , the algorithm will stop. Note: calculation of approximation error is quite costly. Setting the threshold to -1 turns off approximation error calculation and hence makes the algorithm do the maximum number of iterations.
      • DEFAULT_STOP_THRESHOLD

        protected static double DEFAULT_STOP_THRESHOLD
      • seedingStrategy

        protected SeedingStrategy seedingStrategy
        Seeding strategy
      • DEFAULT_SEEDING_STRATEGY

        protected static final SeedingStrategy DEFAULT_SEEDING_STRATEGY
      • ordered

        protected boolean ordered
        Order base vectors according to their 'activity'?
      • approximationError

        protected double approximationError
        Current approximation error
      • approximationErrors

        protected double[] approximationErrors
        Approximation errors during subsequent iterations
      • iterationsCompleted

        protected int iterationsCompleted
        Iteration counter
      • aggregates

        protected double[] aggregates
        Sorting aggregates
      • A

        protected org.carrot2.math.mahout.matrix.DoubleMatrix2D A
        Input matrix
      • U

        protected org.carrot2.math.mahout.matrix.DoubleMatrix2D U
        Base vector result matrix
      • V

        protected org.carrot2.math.mahout.matrix.DoubleMatrix2D V
        Coefficient result matrix
    • Constructor Detail

      • KMeansMatrixFactorization

        public KMeansMatrixFactorization​(org.carrot2.math.mahout.matrix.DoubleMatrix2D A)
        Creates the KMeansMatrixFactorization object for matrix A. Before accessing results, perform computations by calling the compute() method.
        Parameters:
        A - matrix to be factorized. The matrix must have Euclidean length-normalized columns.
    • Method Detail

      • compute

        public void compute()
        Computes the factorization.
      • setK

        public void setK​(int k)
        Sets the number of base vectors k .
        Parameters:
        k - the number of base vectors
      • getK

        public int getK()
        Returns the number of base vectors k .
      • updateApproximationError

        protected boolean updateApproximationError()
        Returns:
        true if the decrease in the approximation error is smaller than the stopThreshold
      • order

        protected void order()
        Orders U and V matrices according to the 'activity' of base vectors.
      • getMaxIterations

        public int getMaxIterations()
        Returns the maximum number of iterations the algorithm is allowed to run.
      • setMaxIterations

        public void setMaxIterations​(int maxIterations)
        Sets the maximum number of iterations the algorithm is allowed to run.
      • getStopThreshold

        public double getStopThreshold()
        Returns the algorithms stopThreshold. If the percentage decrease in approximation error becomes smaller than stopThreshold, the algorithm will stop.
      • setStopThreshold

        public void setStopThreshold​(double stopThreshold)
        Sets the algorithms stopThreshold. If the percentage decrease in approximation error becomes smaller than stopThreshold, the algorithm will stop.

        Note: calculation of approximation error is quite costly. Setting the threshold to -1 turns off calculation of the approximation error and hence makes the algorithm do the maximum allowed number of iterations.

      • getApproximationErrors

        public double[] getApproximationErrors()
      • isOrdered

        public boolean isOrdered()
        Returns true when the factorization is set to generate an ordered basis.
      • setOrdered

        public void setOrdered​(boolean ordered)
        Set to true to generate an ordered basis.
      • getAggregates

        public double[] getAggregates()
        Returns column aggregates for a sorted factorization, and null for an unsorted factorization.
      • getU

        public org.carrot2.math.mahout.matrix.DoubleMatrix2D getU()
        Description copied from interface: MatrixFactorization
        Returns the U matrix (base vectors matrix).
        Specified by:
        getU in interface MatrixFactorization
        Returns:
        U matrix
      • getV

        public org.carrot2.math.mahout.matrix.DoubleMatrix2D getV()
        Description copied from interface: MatrixFactorization
        Returns the V matrix (coefficient matrix)
        Specified by:
        getV in interface MatrixFactorization
        Returns:
        V matrix