Package org.carrot2.language
Class ExtendedWhitespaceTokenizer
- java.lang.Object
-
- org.carrot2.language.ExtendedWhitespaceTokenizer
-
-
Field Summary
-
Fields inherited from interface org.carrot2.language.Tokenizer
TF_COMMON_WORD, TF_QUERY_WORD, TF_SEPARATOR_DOCUMENT, TF_SEPARATOR_FIELD, TF_SEPARATOR_SENTENCE, TF_TERMINATOR, TT_ACRONYM, TT_BARE_URL, TT_EMAIL, TT_EOF, TT_FILE, TT_FULL_URL, TT_HYPHTERM, TT_NUMERIC, TT_PUNCTUATION, TT_TERM, TYPE_MASK
-
-
Constructor Summary
Constructors Constructor Description ExtendedWhitespaceTokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description shortnextToken()Returns the next token from the input stream.voidreset(Reader input)Reset this tokenizer to start parsing another stream.voidsetTermBuffer(MutableCharArray array)Sets the current token image to the provided buffer.
-
-
-
Method Detail
-
reset
public void reset(Reader input)
Reset this tokenizer to start parsing another stream.
-
nextToken
public short nextToken() throws IOExceptionDescription copied from interface:TokenizerReturns the next token from the input stream.- Specified by:
nextTokenin interfaceTokenizer- Returns:
- the type of the token as defined by the
Tokenizer.TT_TERMand other constants orTokenizer.TT_EOFwhen the end of the data stream has been reached. - Throws:
IOException- See Also:
TokenTypeUtils
-
setTermBuffer
public void setTermBuffer(MutableCharArray array)
Description copied from interface:TokenizerSets the current token image to the provided buffer.- Specified by:
setTermBufferin interfaceTokenizer- Parameters:
array- buffer in which the current token's image should be stored
-
-