Package org.carrot2.clustering.lingo
Class ClusterBuilder
- java.lang.Object
-
- org.carrot2.attrs.AttrComposite
-
- org.carrot2.clustering.lingo.ClusterBuilder
-
- All Implemented Interfaces:
AcceptingVisitor
public class ClusterBuilder extends AttrComposite
Builds cluster labels based on the reduced term-document matrix and assigns documents to the labels.
-
-
Field Summary
Fields Modifier and Type Field Description AttrDouble
clusterMergingThreshold
Percentage of overlap between two cluster's document sets at which to merge the clusters.LabelAssigner
labelAssigner
The method of assigning documents to labels when forming clusters.AttrDouble
phraseLabelBoost
Weight of multi-word labels relative to one-word labels.AttrInteger
phraseLengthPenaltyStart
Phrase length at which the overlong multi-word labels should start to be penalized.AttrInteger
phraseLengthPenaltyStop
Phrase length at which the overlong multi-word labels should be removed completely.-
Fields inherited from class org.carrot2.attrs.AttrComposite
attributes
-
-
Constructor Summary
Constructors Constructor Description ClusterBuilder()
-
-
-
Field Detail
-
phraseLabelBoost
public AttrDouble phraseLabelBoost
Weight of multi-word labels relative to one-word labels. Low values will result in more one-word labels being produced, higher values will favor multi-word labels.
-
phraseLengthPenaltyStart
public AttrInteger phraseLengthPenaltyStart
Phrase length at which the overlong multi-word labels should start to be penalized. Phrases of length smaller thanphraseLengthPenaltyStart
will not be penalized.
-
phraseLengthPenaltyStop
public AttrInteger phraseLengthPenaltyStop
Phrase length at which the overlong multi-word labels should be removed completely. Phrases of length larger thanphraseLengthPenaltyStop
will be removed.
-
clusterMergingThreshold
public AttrDouble clusterMergingThreshold
Percentage of overlap between two cluster's document sets at which to merge the clusters. Low values will result in more aggressive merging, which may lead to irrelevant documents in clusters. High values will result in fewer clusters being merged, which may lead to very similar or duplicated clusters.
-
labelAssigner
public LabelAssigner labelAssigner
The method of assigning documents to labels when forming clusters.
-
-