Package org.carrot2.clustering.lingo
Class LingoClusteringAlgorithm
- java.lang.Object
-
- org.carrot2.attrs.AttrComposite
-
- org.carrot2.clustering.lingo.LingoClusteringAlgorithm
-
- All Implemented Interfaces:
AcceptingVisitor,ClusteringAlgorithm
public class LingoClusteringAlgorithm extends AttrComposite implements ClusteringAlgorithm
Lingo clustering algorithm. Implementation as described in: Stanisław Osiński, Dawid Weiss: A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, May/June, 3 (vol. 20), 2005, pp. 48—54.
-
-
Field Summary
Fields Modifier and Type Field Description ClusterBuilderclusterBuilderConfiguration of the structure and labels of clusters.AttrIntegerdesiredClusterCountDetermines number of clusters to create.EphemeralDictionariesdictionariesPer-request overrides of language components (dictionaries).TermDocumentMatrixBuildermatrixBuilderConfiguration of the size and contents of the term-document matrix.TermDocumentMatrixReducermatrixReducerConfiguration of the matrix decomposition method to use for clustering.static StringNAMECompletePreprocessingPipelinepreprocessingConfiguration of the text preprocessing stage.AttrStringqueryHintQuery terms used to retrieve documents being clustered.AttrDoublescoreWeightBalance between cluster score and size during cluster sorting.-
Fields inherited from class org.carrot2.attrs.AttrComposite
attributes
-
-
Constructor Summary
Constructors Constructor Description LingoClusteringAlgorithm()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description <T extends Document>
List<Cluster<T>>cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)Performs Lingo clustering of documents.Set<Class<?>>requiredLanguageComponents()-
Methods inherited from class org.carrot2.attrs.AttrComposite
accept
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.carrot2.attrs.AcceptingVisitor
accept
-
Methods inherited from interface org.carrot2.clustering.ClusteringAlgorithm
supports
-
-
-
-
Field Detail
-
NAME
public static final String NAME
- See Also:
- Constant Field Values
-
scoreWeight
public AttrDouble scoreWeight
Balance between cluster score and size during cluster sorting. Value equal to 0.0 will cause Lingo to sort clusters based only on cluster size. Value equal to 1.0 will cause Lingo to sort clusters based only on cluster score.
-
desiredClusterCount
public AttrInteger desiredClusterCount
Determines number of clusters to create. The larger the value, the more clusters will be created. The number of clusters created by the algorithm will be proportional to the value of this parameter, but may be different.
-
preprocessing
public CompletePreprocessingPipeline preprocessing
Configuration of the text preprocessing stage.
-
matrixBuilder
public TermDocumentMatrixBuilder matrixBuilder
Configuration of the size and contents of the term-document matrix.
-
matrixReducer
public TermDocumentMatrixReducer matrixReducer
Configuration of the matrix decomposition method to use for clustering.
-
clusterBuilder
public ClusterBuilder clusterBuilder
Configuration of the structure and labels of clusters.
-
dictionaries
public EphemeralDictionaries dictionaries
Per-request overrides of language components (dictionaries).- Since:
- 4.1.0
-
queryHint
public final AttrString queryHint
Query terms used to retrieve documents being clustered. The query is used as a hint to avoid creating trivial clusters consisting only of query words.
-
-
Method Detail
-
requiredLanguageComponents
public Set<Class<?>> requiredLanguageComponents()
- Specified by:
requiredLanguageComponentsin interfaceClusteringAlgorithm
-
cluster
public <T extends Document> List<Cluster<T>> cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)
Performs Lingo clustering of documents.- Specified by:
clusterin interfaceClusteringAlgorithm
-
-