Class SuffixTree

java.lang.Object
org.carrot2.text.suffixtree.SuffixTree

public final class SuffixTree
extends Object
Builds a suffix tree (or generalized suffix tree) on a sequence of any integers (or objects that can be represented as unique integers). A direct implementation of Esko Ukkonen's algorithm, but optimized for Java to use primitive data types instead of objects (or boxed types).
See Also:
"E. Ukkonen, On-line construction of suffix trees, Algorithmica, 1995, volume 14, number 3, pages 249-260."
  • Field Details

  • Constructor Details

  • Method Details

    • getTransitionsCount

      public final int getTransitionsCount()
      Returns:
      Return the number of transitions (edges) in the tree.
    • getStatesCount

      public final int getStatesCount()
      Returns:
      Return the number of states in the tree.
    • containsSuffix

      public boolean containsSuffix​(Sequence seq)
      Returns:
      true if this suffix tree has a path from the root state to a leaf state corresponding to a given sequence of objects. This indicates the input sequence had a suffix identical to sequence.
    • visit

      public final void visit​(SuffixTree.IVisitor visitor)
      Walks the states and edges of the suffix tree, depth-first.
    • visitState

      public final void visitState​(int state, SuffixTree.IVisitor visitor)
      Start visiting from a given state.
    • getRootState

      public int getRootState()
      For procedural traversals (not visitors).
    • isLeaf

      public final boolean isLeaf​(int state)
      Check if state is a leaf (has no outgoing edges).
    • firstEdge

      public final int firstEdge​(int state)
      Returns the index of the first edge from a given state or NO_EDGE if a given state has no edges. Does not perform any sanity check on the input state.
    • nextEdge

      public final int nextEdge​(int edge)
      Returns the index of the next edge (sibling) or NO_EDGE if edge is the last edge in its state.
    • findEdge

      public final int findEdge​(int state, int symbol)
      Find a transition from state state, labeled with a given symbol. NO_EDGE is returned if there is no such edge.
    • getToState

      public int getToState​(int edge)
      Returns the target state for a given edge.
    • getStartIndex

      public int getStartIndex​(int edge)
      Returns the edge label's start index (inclusive).
    • getEndIndex

      public int getEndIndex​(int edge)
      Returns the edge label's end index (inclusive).