Package org.carrot2.text.preprocessing
Class PreprocessedDocumentScanner
java.lang.Object
org.carrot2.text.preprocessing.PreprocessedDocumentScanner
public class PreprocessedDocumentScanner extends Object
Iterates over tokenized documents in
PreprocessingContext.-
Field Summary
Fields Modifier and Type Field Description static com.carrotsearch.hppc.predicates.ShortPredicateON_DOCUMENT_SEPARATORPredicate for splitting on document separator.static com.carrotsearch.hppc.predicates.ShortPredicateON_FIELD_SEPARATORPredicate for splitting on field separator.static com.carrotsearch.hppc.predicates.ShortPredicateON_SENTENCE_SEPARATORPredicate for splitting on sentence separator. -
Constructor Summary
Constructors Constructor Description PreprocessedDocumentScanner() -
Method Summary
Modifier and Type Method Description protected voiddocument(PreprocessingContext context, int start, int length)Invoked for each document.static com.carrotsearch.hppc.predicates.ShortPredicateequalTo(short t)Return a newShortPredicatereturningtrueif the argument equals a given value.protected voidfield(PreprocessingContext context, int start, int length)Invoked for each document's field.voiditerate(PreprocessingContext context)Iterate over all documents, fields and sentences inPreprocessingContext.allTokens.protected voidsentence(PreprocessingContext context, int start, int length)Invoked for each document's sentence.
-
Field Details
-
ON_DOCUMENT_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_DOCUMENT_SEPARATORPredicate for splitting on document separator. -
ON_FIELD_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_FIELD_SEPARATORPredicate for splitting on field separator. -
ON_SENTENCE_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_SENTENCE_SEPARATORPredicate for splitting on sentence separator.
-
-
Constructor Details
-
PreprocessedDocumentScanner
public PreprocessedDocumentScanner()
-
-
Method Details
-
equalTo
public static final com.carrotsearch.hppc.predicates.ShortPredicate equalTo(short t)Return a newShortPredicatereturningtrueif the argument equals a given value. -
iterate
Iterate over all documents, fields and sentences inPreprocessingContext.allTokens. -
document
Invoked for each document. Splits further into fields. -
field
Invoked for each document's field. Splits further into sentences. -
sentence
Invoked for each document's sentence.
-