Package org.carrot2.text.preprocessing
Class DocumentAssigner
java.lang.Object
org.carrot2.attrs.AttrComposite
org.carrot2.text.preprocessing.DocumentAssigner
- All Implemented Interfaces:
AcceptingVisitor
public class DocumentAssigner extends AttrComposite
Assigns document to label candidates. For each label candidate from
PreprocessingContext.AllLabels.featureIndex
an BitSet
with the assigned documents is constructed. The
assignment algorithm is rather simple: in order to be assigned to a label, a document must
contain at least one occurrence of each non-stop word from the label.
This class saves the following results to the PreprocessingContext
:
This class requires that InputTokenizer
, CaseNormalizer
, StopListMarker
, PhraseExtractor
and LabelFilterProcessor
be invoked first.
-
Field Summary
Fields Modifier and Type Field Description AttrBoolean
exactPhraseAssignment
Only exact phrase assignments.AttrInteger
minClusterSize
Determines the minimum number of documents in each cluster. -
Constructor Summary
Constructors Constructor Description DocumentAssigner()
-
Method Summary
-
Field Details
-
exactPhraseAssignment
Only exact phrase assignments. When set totrue
, clusters will contain only the documents that contain the cluster's label in its original form, including the order of words. Enabling this option will cause fewer documents to be put in clusters, increasing the precision of assignment, but also increasing the "Other Topics" group. Disabling this option will cause more documents to be put in clusters, which will make the "Other Topics" cluster smaller, but also lower the precision of cluster-document assignments. -
minClusterSize
Determines the minimum number of documents in each cluster.
-
-
Constructor Details
-
DocumentAssigner
public DocumentAssigner()
-