Package org.carrot2.clustering.lingo
Class ClusterBuilder
- java.lang.Object
-
- org.carrot2.attrs.AttrComposite
-
- org.carrot2.clustering.lingo.ClusterBuilder
-
- All Implemented Interfaces:
AcceptingVisitor
public class ClusterBuilder extends AttrComposite
Builds cluster labels based on the reduced term-document matrix and assigns documents to the labels.
-
-
Field Summary
Fields Modifier and Type Field Description AttrDoubleclusterMergingThresholdPercentage of overlap between two cluster's document sets at which to merge the clusters.LabelAssignerlabelAssignerThe method of assigning documents to labels when forming clusters.AttrDoublephraseLabelBoostWeight of multi-word labels relative to one-word labels.AttrIntegerphraseLengthPenaltyStartPhrase length at which the overlong multi-word labels should start to be penalized.AttrIntegerphraseLengthPenaltyStopPhrase length at which the overlong multi-word labels should be removed completely.-
Fields inherited from class org.carrot2.attrs.AttrComposite
attributes
-
-
Constructor Summary
Constructors Constructor Description ClusterBuilder()
-
-
-
Field Detail
-
phraseLabelBoost
public AttrDouble phraseLabelBoost
Weight of multi-word labels relative to one-word labels. Low values will result in more one-word labels being produced, higher values will favor multi-word labels.
-
phraseLengthPenaltyStart
public AttrInteger phraseLengthPenaltyStart
Phrase length at which the overlong multi-word labels should start to be penalized. Phrases of length smaller thanphraseLengthPenaltyStartwill not be penalized.
-
phraseLengthPenaltyStop
public AttrInteger phraseLengthPenaltyStop
Phrase length at which the overlong multi-word labels should be removed completely. Phrases of length larger thanphraseLengthPenaltyStopwill be removed.
-
clusterMergingThreshold
public AttrDouble clusterMergingThreshold
Percentage of overlap between two cluster's document sets at which to merge the clusters. Low values will result in more aggressive merging, which may lead to irrelevant documents in clusters. High values will result in fewer clusters being merged, which may lead to very similar or duplicated clusters.
-
labelAssigner
public LabelAssigner labelAssigner
The method of assigning documents to labels when forming clusters.
-
-