Class CompletePreprocessingPipeline

    • Field Detail

      • wordDfThreshold

        public final AttrInteger wordDfThreshold
        Word Document Frequency threshold. Words appearing in fewer than wordDfThreshold documents will be ignored.
      • phraseDfThreshold

        public final AttrInteger phraseDfThreshold
        Phrase Document Frequency threshold. Phrases appearing in fewer than phraseDfThreshold documents will be ignored.
      • labelFilters

        public LabelFilterProcessor labelFilters
        Label filtering is a composite of individual filters.
      • documentAssigner

        public DocumentAssigner documentAssigner
        Document assigner used by the algorithm, contains bindable attributes.
      • caseNormalizer

        protected final org.carrot2.text.preprocessing.CaseNormalizer caseNormalizer
        Case normalizer used by the algorithm.
      • stemming

        protected final org.carrot2.text.preprocessing.LanguageModelStemmer stemming
        Stemmer used by the algorithm.
      • stopListMarker

        protected final org.carrot2.text.preprocessing.StopListMarker stopListMarker
        Stop list marker used by the algorithm, contains bindable attributes.
      • tokenizer

        protected final org.carrot2.text.preprocessing.InputTokenizer tokenizer
        Tokenizer used by the algorithm.
    • Constructor Detail

      • CompletePreprocessingPipeline

        public CompletePreprocessingPipeline()