Class ExtendedWhitespaceTokenizer

  • All Implemented Interfaces:
    Tokenizer

    public final class ExtendedWhitespaceTokenizer
    extends Object
    implements Tokenizer
    A tokenizer separating input characters on whitespace, but capable of extracting more complex tokens, such as URLs, e-mail addresses and sentence delimiters.
    • Constructor Detail

      • ExtendedWhitespaceTokenizer

        public ExtendedWhitespaceTokenizer()
    • Method Detail

      • reset

        public void reset​(Reader input)
        Reset this tokenizer to start parsing another stream.
        Specified by:
        reset in interface Tokenizer
        Parameters:
        input - the input to tokenize. The reader will not be closed by the tokenizer when the end of stream is reached.
      • setTermBuffer

        public void setTermBuffer​(MutableCharArray array)
        Description copied from interface: Tokenizer
        Sets the current token image to the provided buffer.
        Specified by:
        setTermBuffer in interface Tokenizer
        Parameters:
        array - buffer in which the current token's image should be stored