Package org.carrot2.text.preprocessing
Class PreprocessedDocumentScanner
- java.lang.Object
-
- org.carrot2.text.preprocessing.PreprocessedDocumentScanner
-
public class PreprocessedDocumentScanner extends Object
Iterates over tokenized documents inPreprocessingContext.
-
-
Field Summary
Fields Modifier and Type Field Description static com.carrotsearch.hppc.predicates.ShortPredicateON_DOCUMENT_SEPARATORPredicate for splitting on document separator.static com.carrotsearch.hppc.predicates.ShortPredicateON_FIELD_SEPARATORPredicate for splitting on field separator.static com.carrotsearch.hppc.predicates.ShortPredicateON_SENTENCE_SEPARATORPredicate for splitting on sentence separator.
-
Constructor Summary
Constructors Constructor Description PreprocessedDocumentScanner()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voiddocument(PreprocessingContext context, int start, int length)Invoked for each document.static com.carrotsearch.hppc.predicates.ShortPredicateequalTo(short t)Return a newShortPredicatereturningtrueif the argument equals a given value.protected voidfield(PreprocessingContext context, int start, int length)Invoked for each document's field.voiditerate(PreprocessingContext context)Iterate over all documents, fields and sentences inPreprocessingContext.allTokens.protected voidsentence(PreprocessingContext context, int start, int length)Invoked for each document's sentence.
-
-
-
Field Detail
-
ON_DOCUMENT_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_DOCUMENT_SEPARATOR
Predicate for splitting on document separator.
-
ON_FIELD_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_FIELD_SEPARATOR
Predicate for splitting on field separator.
-
ON_SENTENCE_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_SENTENCE_SEPARATOR
Predicate for splitting on sentence separator.
-
-
Method Detail
-
equalTo
public static final com.carrotsearch.hppc.predicates.ShortPredicate equalTo(short t)
Return a newShortPredicatereturningtrueif the argument equals a given value.
-
iterate
public final void iterate(PreprocessingContext context)
Iterate over all documents, fields and sentences inPreprocessingContext.allTokens.
-
document
protected void document(PreprocessingContext context, int start, int length)
Invoked for each document. Splits further into fields.
-
field
protected void field(PreprocessingContext context, int start, int length)
Invoked for each document's field. Splits further into sentences.
-
sentence
protected void sentence(PreprocessingContext context, int start, int length)
Invoked for each document's sentence.
-
-