org.carrot2.clustering.lingo.LingoClusteringAlgorithm

All Implemented Interfaces:: AcceptingVisitor, ClusteringAlgorithm

public class LingoClusteringAlgorithm
extends AttrComposite
implements ClusteringAlgorithm

Lingo clustering algorithm. Implementation as described in: Stanisław Osiński, Dawid Weiss: A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, May/June, 3 (vol. 20), 2005, pp. 48—54.

Field Summary

Fields
Modifier and Type	Field	Description
`ClusterBuilder`	`clusterBuilder`	Cluster builder, contains attributes determining the structure and labels of clusters produced by the Lingo algorithm.
`AttrInteger`	`desiredClusterCount`	Desired cluster count.
`TermDocumentMatrixBuilder`	`matrixBuilder`	Term-document matrix builder, contains attributes determining the size and contents of the matrix
`TermDocumentMatrixReducer`	`matrixReducer`	Term-document matrix reducer, contains attributes determining the matrix decomposition method to be used during clustering.
`static String`	`NAME`
`CompletePreprocessingPipeline`	`preprocessing`	Preprocessing pipeline.
`AttrString`	`queryHint`	Query hint.
`AttrDouble`	`scoreWeight`	Balance between cluster score and size during cluster sorting.

Fields inherited from class org.carrot2.attrs.AttrComposite

attributes

Constructor Summary

Constructors

Constructor Description

LingoClusteringAlgorithm()

Method Summary

Modifier and Type	Method	Description
`<T extends Document> List<Cluster<T>>`	`cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)`	Performs Lingo clustering of documents.
`Set<Class<?>>`	`requiredLanguageComponents()`

Methods inherited from class org.carrot2.attrs.AttrComposite

accept

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.carrot2.attrs.AcceptingVisitor

accept

Methods inherited from interface org.carrot2.clustering.ClusteringAlgorithm

supports

Field Details
- NAME
  
  public static final String NAME
  
  See Also:
  
  Constant Field Values
- scoreWeight
  
  public AttrDouble scoreWeight
  
  Balance between cluster score and size during cluster sorting. Value equal to 0.0 will cause Lingo to sort clusters based only on cluster size. Value equal to 1.0 will cause Lingo to sort clusters based only on cluster score.
- desiredClusterCount
  
  public AttrInteger desiredClusterCount
  
  Desired cluster count. A factor used to calculate the number of clusters based on the number of documents on input. The larger the value, the more clusters will be created. The number of clusters created by the algorithm will be proportionally adjusted to the desired cluster count, but may be different.
- preprocessing
  
  public CompletePreprocessingPipeline preprocessing
  
  Preprocessing pipeline.
- matrixBuilder
  
  public TermDocumentMatrixBuilder matrixBuilder
  
  Term-document matrix builder, contains attributes determining the size and contents of the matrix
- matrixReducer
  
  public TermDocumentMatrixReducer matrixReducer
  
  Term-document matrix reducer, contains attributes determining the matrix decomposition method to be used during clustering.
- clusterBuilder
  
  public ClusterBuilder clusterBuilder
  
  Cluster builder, contains attributes determining the structure and labels of clusters produced by the Lingo algorithm.
- queryHint
  
  public final AttrString queryHint
  
  Query hint. Query terms used to retrieve documents being clustered. The query is used as a hint to avoid creating trivial clusters consisting only of query words.
Constructor Details
- LingoClusteringAlgorithm
  
  public LingoClusteringAlgorithm()
Method Details
- requiredLanguageComponents
  
  public Set<Class<?>> requiredLanguageComponents()
  
  Specified by:
  
  requiredLanguageComponents in interface ClusteringAlgorithm
- cluster
  
  public <T extends Document> List<Cluster<T>> cluster(Stream<? extends T> docStream, LanguageComponents languageComponents)
  
  Performs Lingo clustering of documents.
  
  Specified by:
  
  cluster in interface ClusteringAlgorithm

Class LingoClusteringAlgorithm

Field Summary

Fields inherited from class org.carrot2.attrs.AttrComposite

Constructor Summary

Method Summary

Methods inherited from class org.carrot2.attrs.AttrComposite

Methods inherited from class java.lang.Object

Methods inherited from interface org.carrot2.attrs.AcceptingVisitor

Methods inherited from interface org.carrot2.clustering.ClusteringAlgorithm

Field Details

Constructor Details

Method Details