Package org.carrot2.clustering.lingo
Class LingoClusteringAlgorithm
- java.lang.Object
-
- org.carrot2.attrs.AttrComposite
-
- org.carrot2.clustering.lingo.LingoClusteringAlgorithm
-
- All Implemented Interfaces:
AcceptingVisitor
,ClusteringAlgorithm
public class LingoClusteringAlgorithm extends AttrComposite implements ClusteringAlgorithm
Lingo clustering algorithm. Implementation as described in: Stanisław Osiński, Dawid Weiss: A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, May/June, 3 (vol. 20), 2005, pp. 48—54.
-
-
Field Summary
Fields Modifier and Type Field Description ClusterBuilder
clusterBuilder
Cluster builder, contains attributes determining the structure and labels of clusters produced by the Lingo algorithm.AttrInteger
desiredClusterCount
Desired cluster count.TermDocumentMatrixBuilder
matrixBuilder
Term-document matrix builder, contains attributes determining the size and contents of the matrixTermDocumentMatrixReducer
matrixReducer
Term-document matrix reducer, contains attributes determining the matrix decomposition method to be used during clustering.static String
NAME
CompletePreprocessingPipeline
preprocessing
Preprocessing pipeline.AttrString
queryHint
Query hint.AttrDouble
scoreWeight
Balance between cluster score and size during cluster sorting.-
Fields inherited from class org.carrot2.attrs.AttrComposite
attributes
-
-
Constructor Summary
Constructors Constructor Description LingoClusteringAlgorithm()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description <T extends Document>
List<Cluster<T>>cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)
Performs Lingo clustering of documents.boolean
supports(LanguageComponents languageComponents)
-
Methods inherited from class org.carrot2.attrs.AttrComposite
accept
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.carrot2.attrs.AcceptingVisitor
accept
-
-
-
-
Field Detail
-
NAME
public static final String NAME
- See Also:
- Constant Field Values
-
scoreWeight
public AttrDouble scoreWeight
Balance between cluster score and size during cluster sorting. Value equal to 0.0 will cause Lingo to sort clusters based only on cluster size. Value equal to 1.0 will cause Lingo to sort clusters based only on cluster score.
-
desiredClusterCount
public AttrInteger desiredClusterCount
Desired cluster count. A factor used to calculate the number of clusters based on the number of documents on input. The larger the value, the more clusters will be created. The number of clusters created by the algorithm will be proportionally adjusted to the desired cluster count, but may be different.
-
preprocessing
public CompletePreprocessingPipeline preprocessing
Preprocessing pipeline.
-
matrixBuilder
public TermDocumentMatrixBuilder matrixBuilder
Term-document matrix builder, contains attributes determining the size and contents of the matrix
-
matrixReducer
public TermDocumentMatrixReducer matrixReducer
Term-document matrix reducer, contains attributes determining the matrix decomposition method to be used during clustering.
-
clusterBuilder
public ClusterBuilder clusterBuilder
Cluster builder, contains attributes determining the structure and labels of clusters produced by the Lingo algorithm.
-
queryHint
public final AttrString queryHint
Query hint. Query terms used to retrieve documents being clustered. The query is used as a hint to avoid creating trivial clusters consisting only of query words.
-
-
Method Detail
-
supports
public boolean supports(LanguageComponents languageComponents)
- Specified by:
supports
in interfaceClusteringAlgorithm
-
cluster
public <T extends Document> List<Cluster<T>> cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)
Performs Lingo clustering of documents.- Specified by:
cluster
in interfaceClusteringAlgorithm
-
-