Package org.carrot2.text.preprocessing
Class PreprocessedDocumentScanner
java.lang.Object
org.carrot2.text.preprocessing.PreprocessedDocumentScanner
public class PreprocessedDocumentScanner extends Object
Iterates over tokenized documents in
PreprocessingContext
.-
Field Summary
Fields Modifier and Type Field Description static com.carrotsearch.hppc.predicates.ShortPredicate
ON_DOCUMENT_SEPARATOR
Predicate for splitting on document separator.static com.carrotsearch.hppc.predicates.ShortPredicate
ON_FIELD_SEPARATOR
Predicate for splitting on field separator.static com.carrotsearch.hppc.predicates.ShortPredicate
ON_SENTENCE_SEPARATOR
Predicate for splitting on sentence separator. -
Constructor Summary
Constructors Constructor Description PreprocessedDocumentScanner()
-
Method Summary
Modifier and Type Method Description protected void
document(PreprocessingContext context, int start, int length)
Invoked for each document.static com.carrotsearch.hppc.predicates.ShortPredicate
equalTo(short t)
Return a newShortPredicate
returningtrue
if the argument equals a given value.protected void
field(PreprocessingContext context, int start, int length)
Invoked for each document's field.void
iterate(PreprocessingContext context)
Iterate over all documents, fields and sentences inPreprocessingContext.allTokens
.protected void
sentence(PreprocessingContext context, int start, int length)
Invoked for each document's sentence.
-
Field Details
-
ON_DOCUMENT_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_DOCUMENT_SEPARATORPredicate for splitting on document separator. -
ON_FIELD_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_FIELD_SEPARATORPredicate for splitting on field separator. -
ON_SENTENCE_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_SENTENCE_SEPARATORPredicate for splitting on sentence separator.
-
-
Constructor Details
-
PreprocessedDocumentScanner
public PreprocessedDocumentScanner()
-
-
Method Details
-
equalTo
public static final com.carrotsearch.hppc.predicates.ShortPredicate equalTo(short t)Return a newShortPredicate
returningtrue
if the argument equals a given value. -
iterate
Iterate over all documents, fields and sentences inPreprocessingContext.allTokens
. -
document
Invoked for each document. Splits further into fields. -
field
Invoked for each document's field. Splits further into sentences. -
sentence
Invoked for each document's sentence.
-