Package org.carrot2.text.preprocessing
Class PreprocessedDocumentScanner
- java.lang.Object
-
- org.carrot2.text.preprocessing.PreprocessedDocumentScanner
-
public class PreprocessedDocumentScanner extends Object
Iterates over tokenized documents inPreprocessingContext
.
-
-
Field Summary
Fields Modifier and Type Field Description static com.carrotsearch.hppc.predicates.ShortPredicate
ON_DOCUMENT_SEPARATOR
Predicate for splitting on document separator.static com.carrotsearch.hppc.predicates.ShortPredicate
ON_FIELD_SEPARATOR
Predicate for splitting on field separator.static com.carrotsearch.hppc.predicates.ShortPredicate
ON_SENTENCE_SEPARATOR
Predicate for splitting on sentence separator.
-
Constructor Summary
Constructors Constructor Description PreprocessedDocumentScanner()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
document(PreprocessingContext context, int start, int length)
Invoked for each document.static com.carrotsearch.hppc.predicates.ShortPredicate
equalTo(short t)
Return a newShortPredicate
returningtrue
if the argument equals a given value.protected void
field(PreprocessingContext context, int start, int length)
Invoked for each document's field.void
iterate(PreprocessingContext context)
Iterate over all documents, fields and sentences inPreprocessingContext.allTokens
.protected void
sentence(PreprocessingContext context, int start, int length)
Invoked for each document's sentence.
-
-
-
Field Detail
-
ON_DOCUMENT_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_DOCUMENT_SEPARATOR
Predicate for splitting on document separator.
-
ON_FIELD_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_FIELD_SEPARATOR
Predicate for splitting on field separator.
-
ON_SENTENCE_SEPARATOR
public static final com.carrotsearch.hppc.predicates.ShortPredicate ON_SENTENCE_SEPARATOR
Predicate for splitting on sentence separator.
-
-
Method Detail
-
equalTo
public static final com.carrotsearch.hppc.predicates.ShortPredicate equalTo(short t)
Return a newShortPredicate
returningtrue
if the argument equals a given value.
-
iterate
public final void iterate(PreprocessingContext context)
Iterate over all documents, fields and sentences inPreprocessingContext.allTokens
.
-
document
protected void document(PreprocessingContext context, int start, int length)
Invoked for each document. Splits further into fields.
-
field
protected void field(PreprocessingContext context, int start, int length)
Invoked for each document's field. Splits further into sentences.
-
sentence
protected void sentence(PreprocessingContext context, int start, int length)
Invoked for each document's sentence.
-
-