Class ClusterBuilder

  • All Implemented Interfaces:
    AcceptingVisitor

    public class ClusterBuilder
    extends AttrComposite
    Builds cluster labels based on the reduced term-document matrix and assigns documents to the labels.
    • Field Detail

      • phraseLabelBoost

        public AttrDouble phraseLabelBoost
        Weight of multi-word labels relative to one-word labels. Low values will result in more one-word labels being produced, higher values will favor multi-word labels.
      • phraseLengthPenaltyStart

        public AttrInteger phraseLengthPenaltyStart
        Phrase length at which the overlong multi-word labels should start to be penalized. Phrases of length smaller than phraseLengthPenaltyStart will not be penalized.
      • phraseLengthPenaltyStop

        public AttrInteger phraseLengthPenaltyStop
        Phrase length at which the overlong multi-word labels should be removed completely. Phrases of length larger than phraseLengthPenaltyStop will be removed.
      • clusterMergingThreshold

        public AttrDouble clusterMergingThreshold
        Percentage of overlap between two cluster's document sets at which to merge the clusters. Low values will result in more aggressive merging, which may lead to irrelevant documents in clusters. High values will result in fewer clusters being merged, which may lead to very similar or duplicated clusters.
      • labelAssigner

        public LabelAssigner labelAssigner
        The method of assigning documents to labels when forming clusters.
    • Constructor Detail

      • ClusterBuilder

        public ClusterBuilder()