Package org.carrot2.clustering.kmeans
Class BisectingKMeansClusteringAlgorithm
java.lang.Object
org.carrot2.attrs.AttrComposite
org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm
- All Implemented Interfaces:
- AcceptingVisitor,- ClusteringAlgorithm
public class BisectingKMeansClusteringAlgorithm extends AttrComposite implements ClusteringAlgorithm
A very simple implementation of bisecting k-means clustering. Unlike other algorithms in Carrot2,
 this one creates hard clustering (one document belongs only to one cluster). On the other hand,
 the clusters are labeled only with individual words that may not always fully correspond to all
 documents in the cluster.
- 
Field SummaryFields Modifier and Type Field Description AttrIntegerclusterCountThe number of clusters to create.AttrIntegerlabelCountLabel count.TermDocumentMatrixBuildermatrixBuilderTerm-document matrix builder for the algorithm.TermDocumentMatrixReducermatrixReducerTerm-document matrix reducer for the algorithm.AttrIntegermaxIterationsThe maximum number of k-means iterations to perform.static StringNAMEAttrIntegerpartitionCountPartition count.BasicPreprocessingPipelinepreprocessingA pipeline of components transforming input documents into a PreprocessingContext.AttrStringqueryHintQuery terms used to retrieve documents.AttrBooleanuseDimensionalityReductionUse dimensionality reduction.
- 
Constructor SummaryConstructors Constructor Description BisectingKMeansClusteringAlgorithm()
- 
Method SummaryModifier and Type Method Description <T extends Document>
 List<Cluster<T>>cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)Set<Class<?>>requiredLanguageComponents()
- 
Field Details- 
NAME- See Also:
- Constant Field Values
 
- 
clusterCountThe number of clusters to create. The algorithm will create at most the specified number of clusters.
- 
maxIterationsThe maximum number of k-means iterations to perform.
- 
partitionCountPartition count. The number of partitions to create at each k-means clustering iteration.
- 
labelCountLabel count. The minimum number of labels to return for each cluster.
- 
queryHintQuery terms used to retrieve documents. The query is used as a hint to avoid trivial clusters.
- 
useDimensionalityReductionUse dimensionality reduction. Iftrue, k-means will be applied on the dimensionality-reduced term-document matrix with the number of dimensions being equal to twice the number of requested clusters. If the number of dimensions is lower than the number of input documents, reduction will not be performed. Iffalse, the k-means will be performed directly on the original term-document matrix.
- 
matrixBuilderTerm-document matrix builder for the algorithm.
- 
matrixReducerTerm-document matrix reducer for the algorithm.
- 
preprocessingA pipeline of components transforming input documents into a PreprocessingContext.
 
- 
- 
Constructor Details- 
BisectingKMeansClusteringAlgorithmpublic BisectingKMeansClusteringAlgorithm()
 
- 
- 
Method Details- 
requiredLanguageComponents- Specified by:
- requiredLanguageComponentsin interface- ClusteringAlgorithm
 
- 
clusterpublic <T extends Document> List<Cluster<T>> cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)- Specified by:
- clusterin interface- ClusteringAlgorithm
 
 
-