Class BisectingKMeansClusteringAlgorithm

  • All Implemented Interfaces:
    AcceptingVisitor, ClusteringAlgorithm

    public class BisectingKMeansClusteringAlgorithm
    extends AttrComposite
    implements ClusteringAlgorithm
    A very simple implementation of bisecting k-means clustering. Unlike other algorithms in Carrot2, this one creates hard clustering (one document belongs only to one cluster). On the other hand, the clusters are labeled only with individual words that may not always fully correspond to all documents in the cluster.
    • Field Detail

      • clusterCount

        public final AttrInteger clusterCount
        Number of clusters to create. The algorithm will create at most the specified number of clusters.
      • maxIterations

        public final AttrInteger maxIterations
        Maximum number of k-means iterations to perform.
      • partitionCount

        public final AttrInteger partitionCount
        Number of partitions to create at each k-means clustering iteration.
      • labelCount

        public final AttrInteger labelCount
        Minimum number of labels to return for each cluster.
      • queryHint

        public final AttrString queryHint
        Query terms used to retrieve documents. The query is used as a hint to avoid trivial clusters.
      • useDimensionalityReduction

        public final AttrBoolean useDimensionalityReduction
        If enabled, k-means will be applied on the dimensionality-reduced term-document matrix. The number of dimensions will be equal to twice the number of requested clusters. If the number of dimensions is lower than the number of input documents, reduction will not be performed. If disabled, the k-means will be performed directly on the original term-document matrix.
      • matrixBuilder

        public TermDocumentMatrixBuilder matrixBuilder
        Configuration of the size and contents of the term-document matrix.
      • matrixReducer

        public TermDocumentMatrixReducer matrixReducer
        Configuration of the matrix decomposition method to use for clustering.
      • dictionaries

        public EphemeralDictionaries dictionaries
        Per-request overrides of language components (dictionaries).
        Since:
        4.1.0
    • Constructor Detail

      • BisectingKMeansClusteringAlgorithm

        public BisectingKMeansClusteringAlgorithm()