Class PreprocessedDocumentScanner

java.lang.Object
org.carrot2.text.preprocessing.PreprocessedDocumentScanner

public class PreprocessedDocumentScanner
extends Object
Iterates over tokenized documents in PreprocessingContext.
  • Field Details

    • ON_DOCUMENT_SEPARATOR

      public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_DOCUMENT_SEPARATOR
      Predicate for splitting on document separator.
    • ON_FIELD_SEPARATOR

      public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_FIELD_SEPARATOR
      Predicate for splitting on field separator.
    • ON_SENTENCE_SEPARATOR

      public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_SENTENCE_SEPARATOR
      Predicate for splitting on sentence separator.
  • Constructor Details

  • Method Details

    • equalTo

      public static final com.carrotsearch.hppc.predicates.ShortPredicate equalTo​(short t)
      Return a new ShortPredicate returning true if the argument equals a given value.
    • iterate

      public final void iterate​(PreprocessingContext context)
      Iterate over all documents, fields and sentences in PreprocessingContext.allTokens.
    • document

      protected void document​(PreprocessingContext context, int start, int length)
      Invoked for each document. Splits further into fields.
    • field

      protected void field​(PreprocessingContext context, int start, int length)
      Invoked for each document's field. Splits further into sentences.
    • sentence

      protected void sentence​(PreprocessingContext context, int start, int length)
      Invoked for each document's sentence.