org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm

All Implemented Interfaces:: AcceptingVisitor, ClusteringAlgorithm

public class BisectingKMeansClusteringAlgorithm
extends AttrComposite
implements ClusteringAlgorithm

A very simple implementation of bisecting k-means clustering. Unlike other algorithms in Carrot2, this one creates hard clustering (one document belongs only to one cluster). On the other hand, the clusters are labeled only with individual words that may not always fully correspond to all documents in the cluster.

Field Summary

Fields
Modifier and Type	Field	Description
`AttrInteger`	`clusterCount`	The number of clusters to create.
`AttrInteger`	`labelCount`	Label count.
`TermDocumentMatrixBuilder`	`matrixBuilder`	Term-document matrix builder for the algorithm.
`TermDocumentMatrixReducer`	`matrixReducer`	Term-document matrix reducer for the algorithm.
`AttrInteger`	`maxIterations`	The maximum number of k-means iterations to perform.
`static String`	`NAME`
`AttrInteger`	`partitionCount`	Partition count.
`BasicPreprocessingPipeline`	`preprocessing`	A pipeline of components transforming input documents into a PreprocessingContext.
`AttrString`	`queryHint`	Query terms used to retrieve documents.
`AttrBoolean`	`useDimensionalityReduction`	Use dimensionality reduction.

Fields inherited from class org.carrot2.attrs.AttrComposite

attributes

Constructor Summary

Constructors

Constructor Description

BisectingKMeansClusteringAlgorithm()

Method Summary

Modifier and Type	Method	Description
`<T extends Document> List<Cluster<T>>`	`cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)`
`Set<Class<?>>`	`requiredLanguageComponents()`

Methods inherited from class org.carrot2.attrs.AttrComposite

accept

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.carrot2.attrs.AcceptingVisitor

accept

Methods inherited from interface org.carrot2.clustering.ClusteringAlgorithm

supports

Field Details
- NAME
  
  public static final String NAME
  
  See Also:
  
  Constant Field Values
- clusterCount
  
  public final AttrInteger clusterCount
  
  The number of clusters to create. The algorithm will create at most the specified number of clusters.
- maxIterations
  
  public final AttrInteger maxIterations
  
  The maximum number of k-means iterations to perform.
- partitionCount
  
  public final AttrInteger partitionCount
  
  Partition count. The number of partitions to create at each k-means clustering iteration.
- labelCount
  
  public final AttrInteger labelCount
  
  Label count. The minimum number of labels to return for each cluster.
- queryHint
  
  public final AttrString queryHint
  
  Query terms used to retrieve documents. The query is used as a hint to avoid trivial clusters.
- useDimensionalityReduction
  
  public final AttrBoolean useDimensionalityReduction
  
  Use dimensionality reduction. If true, k-means will be applied on the dimensionality-reduced term-document matrix with the number of dimensions being equal to twice the number of requested clusters. If the number of dimensions is lower than the number of input documents, reduction will not be performed. If false, the k-means will be performed directly on the original term-document matrix.
- matrixBuilder
  
  public TermDocumentMatrixBuilder matrixBuilder
  
  Term-document matrix builder for the algorithm.
- matrixReducer
  
  public TermDocumentMatrixReducer matrixReducer
  
  Term-document matrix reducer for the algorithm.
- preprocessing
  
  public BasicPreprocessingPipeline preprocessing
  
  A pipeline of components transforming input documents into a PreprocessingContext.
Constructor Details
- BisectingKMeansClusteringAlgorithm
  
  public BisectingKMeansClusteringAlgorithm()
Method Details
- requiredLanguageComponents
  
  public Set<Class<?>> requiredLanguageComponents()
  
  Specified by:
  
  requiredLanguageComponents in interface ClusteringAlgorithm
- cluster
  
  public <T extends Document> List<Cluster<T>> cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)
  
  Specified by:
  
  cluster in interface ClusteringAlgorithm

Class BisectingKMeansClusteringAlgorithm

Field Summary

Fields inherited from class org.carrot2.attrs.AttrComposite

Constructor Summary

Method Summary

Methods inherited from class org.carrot2.attrs.AttrComposite

Methods inherited from class java.lang.Object

Methods inherited from interface org.carrot2.attrs.AcceptingVisitor

Methods inherited from interface org.carrot2.clustering.ClusteringAlgorithm

Field Details

Constructor Details

Method Details