Package org.carrot2.clustering.lingo
Class LingoClusteringAlgorithm
java.lang.Object
org.carrot2.attrs.AttrComposite
org.carrot2.clustering.lingo.LingoClusteringAlgorithm
- All Implemented Interfaces:
- AcceptingVisitor,- ClusteringAlgorithm
public class LingoClusteringAlgorithm extends AttrComposite implements ClusteringAlgorithm
Lingo clustering algorithm. Implementation as described in: Stanisław Osiński, Dawid Weiss: A
 Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, May/June, 3
 (vol. 20), 2005, pp. 48—54.
- 
Field SummaryFields Modifier and Type Field Description ClusterBuilderclusterBuilderCluster builder, contains attributes determining the structure and labels of clusters produced by the Lingo algorithm.AttrIntegerdesiredClusterCountDesired cluster count.TermDocumentMatrixBuildermatrixBuilderTerm-document matrix builder, contains attributes determining the size and contents of the matrixTermDocumentMatrixReducermatrixReducerTerm-document matrix reducer, contains attributes determining the matrix decomposition method to be used during clustering.static StringNAMECompletePreprocessingPipelinepreprocessingPreprocessing pipeline.AttrStringqueryHintQuery hint.AttrDoublescoreWeightBalance between cluster score and size during cluster sorting.
- 
Constructor SummaryConstructors Constructor Description LingoClusteringAlgorithm()
- 
Method SummaryModifier and Type Method Description <T extends Document>
 List<Cluster<T>>cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)Performs Lingo clustering of documents.Set<Class<?>>requiredLanguageComponents()
- 
Field Details- 
NAME- See Also:
- Constant Field Values
 
- 
scoreWeightBalance between cluster score and size during cluster sorting. Value equal to 0.0 will cause Lingo to sort clusters based only on cluster size. Value equal to 1.0 will cause Lingo to sort clusters based only on cluster score.
- 
desiredClusterCountDesired cluster count. A factor used to calculate the number of clusters based on the number of documents on input. The larger the value, the more clusters will be created. The number of clusters created by the algorithm will be proportionally adjusted to the desired cluster count, but may be different.
- 
preprocessingPreprocessing pipeline.
- 
matrixBuilderTerm-document matrix builder, contains attributes determining the size and contents of the matrix
- 
matrixReducerTerm-document matrix reducer, contains attributes determining the matrix decomposition method to be used during clustering.
- 
clusterBuilderCluster builder, contains attributes determining the structure and labels of clusters produced by the Lingo algorithm.
- 
queryHintQuery hint. Query terms used to retrieve documents being clustered. The query is used as a hint to avoid creating trivial clusters consisting only of query words.
 
- 
- 
Constructor Details- 
LingoClusteringAlgorithmpublic LingoClusteringAlgorithm()
 
- 
- 
Method Details- 
requiredLanguageComponents- Specified by:
- requiredLanguageComponentsin interface- ClusteringAlgorithm
 
- 
clusterpublic <T extends Document> List<Cluster<T>> cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)Performs Lingo clustering of documents.- Specified by:
- clusterin interface- ClusteringAlgorithm
 
 
-