Package org.carrot2.clustering.lingo
Class LingoClusteringAlgorithm
- java.lang.Object
-
- org.carrot2.attrs.AttrComposite
-
- org.carrot2.clustering.lingo.LingoClusteringAlgorithm
-
- All Implemented Interfaces:
AcceptingVisitor
,ClusteringAlgorithm
public class LingoClusteringAlgorithm extends AttrComposite implements ClusteringAlgorithm
Lingo clustering algorithm. Implementation as described in: Stanisław Osiński, Dawid Weiss: A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, May/June, 3 (vol. 20), 2005, pp. 48—54.
-
-
Field Summary
Fields Modifier and Type Field Description ClusterBuilder
clusterBuilder
Configuration of the structure and labels of clusters.AttrInteger
desiredClusterCount
Determines number of clusters to create.EphemeralDictionaries
dictionaries
Per-request overrides of language components (dictionaries).TermDocumentMatrixBuilder
matrixBuilder
Configuration of the size and contents of the term-document matrix.TermDocumentMatrixReducer
matrixReducer
Configuration of the matrix decomposition method to use for clustering.static String
NAME
CompletePreprocessingPipeline
preprocessing
Configuration of the text preprocessing stage.AttrString
queryHint
Query terms used to retrieve documents being clustered.AttrDouble
scoreWeight
Balance between cluster score and size during cluster sorting.-
Fields inherited from class org.carrot2.attrs.AttrComposite
attributes
-
-
Constructor Summary
Constructors Constructor Description LingoClusteringAlgorithm()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description <T extends Document>
List<Cluster<T>>cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)
Performs Lingo clustering of documents.Set<Class<?>>
requiredLanguageComponents()
-
Methods inherited from class org.carrot2.attrs.AttrComposite
accept
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.carrot2.attrs.AcceptingVisitor
accept
-
Methods inherited from interface org.carrot2.clustering.ClusteringAlgorithm
supports
-
-
-
-
Field Detail
-
NAME
public static final String NAME
- See Also:
- Constant Field Values
-
scoreWeight
public AttrDouble scoreWeight
Balance between cluster score and size during cluster sorting. Value equal to 0.0 will cause Lingo to sort clusters based only on cluster size. Value equal to 1.0 will cause Lingo to sort clusters based only on cluster score.
-
desiredClusterCount
public AttrInteger desiredClusterCount
Determines number of clusters to create. The larger the value, the more clusters will be created. The number of clusters created by the algorithm will be proportional to the value of this parameter, but may be different.
-
preprocessing
public CompletePreprocessingPipeline preprocessing
Configuration of the text preprocessing stage.
-
matrixBuilder
public TermDocumentMatrixBuilder matrixBuilder
Configuration of the size and contents of the term-document matrix.
-
matrixReducer
public TermDocumentMatrixReducer matrixReducer
Configuration of the matrix decomposition method to use for clustering.
-
clusterBuilder
public ClusterBuilder clusterBuilder
Configuration of the structure and labels of clusters.
-
dictionaries
public EphemeralDictionaries dictionaries
Per-request overrides of language components (dictionaries).- Since:
- 4.1.0
-
queryHint
public final AttrString queryHint
Query terms used to retrieve documents being clustered. The query is used as a hint to avoid creating trivial clusters consisting only of query words.
-
-
Method Detail
-
requiredLanguageComponents
public Set<Class<?>> requiredLanguageComponents()
- Specified by:
requiredLanguageComponents
in interfaceClusteringAlgorithm
-
cluster
public <T extends Document> List<Cluster<T>> cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)
Performs Lingo clustering of documents.- Specified by:
cluster
in interfaceClusteringAlgorithm
-
-