Package org.carrot2.text.suffixtree
Class SuffixTree
- java.lang.Object
-
- org.carrot2.text.suffixtree.SuffixTree
-
public final class SuffixTree extends Object
Builds a suffix tree (or generalized suffix tree) on a sequence of any integers (or objects that can be represented as unique integers). A direct implementation of Esko Ukkonen's algorithm, but optimized for Java to use primitive data types instead of objects (or boxed types).- See Also:
- "E. Ukkonen, On-line construction of suffix trees, Algorithmica, 1995, volume 14, number 3, pages 249-260."
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
SuffixTree.IProgressCallback
Progress callback is invoked when iterating forward through the input sequence elements.static interface
SuffixTree.IStateCallback
A callback invoked when new states are added to the tree.static interface
SuffixTree.IVisitor
Visitor interface for traversals.static class
SuffixTree.VisitorAdapter
Empty implementation recursively walking the entire suffix tree.
-
Field Summary
Fields Modifier and Type Field Description static int
NO_EDGE
Marker for the state's last edge intransitions
.
-
Constructor Summary
Constructors Constructor Description SuffixTree(Sequence sequence, SuffixTree.IStateCallback newStateCallback, SuffixTree.IProgressCallback progressCallback)
Build a suffix tree for a given input sequence of symbols.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
containsSuffix(Sequence seq)
int
findEdge(int state, int symbol)
Find a transition from statestate
, labeled with a given symbol.int
firstEdge(int state)
Returns the index of the first edge from a given state orNO_EDGE
if a given state has no edges.int
getEndIndex(int edge)
Returns the edge label's end index (inclusive).int
getRootState()
For procedural traversals (not visitors).int
getStartIndex(int edge)
Returns the edge label's start index (inclusive).int
getStatesCount()
int
getToState(int edge)
Returns the target state for a given edge.int
getTransitionsCount()
boolean
isLeaf(int state)
Check ifstate
is a leaf (has no outgoing edges).int
nextEdge(int edge)
Returns the index of the next edge (sibling) orNO_EDGE
ifedge
is the last edge in its state.void
visit(SuffixTree.IVisitor visitor)
Walks the states and edges of the suffix tree, depth-first.void
visitState(int state, SuffixTree.IVisitor visitor)
Start visiting from a given state.
-
-
-
Field Detail
-
NO_EDGE
public static final int NO_EDGE
Marker for the state's last edge intransitions
.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
SuffixTree
public SuffixTree(Sequence sequence, SuffixTree.IStateCallback newStateCallback, SuffixTree.IProgressCallback progressCallback)
Build a suffix tree for a given input sequence of symbols.
-
-
Method Detail
-
getTransitionsCount
public final int getTransitionsCount()
- Returns:
- Return the number of transitions (edges) in the tree.
-
getStatesCount
public final int getStatesCount()
- Returns:
- Return the number of states in the tree.
-
containsSuffix
public boolean containsSuffix(Sequence seq)
- Returns:
true
if this suffix tree has a path from the root state to a leaf state corresponding to a given sequence of objects. This indicates the input sequence had a suffix identical tosequence
.
-
visit
public final void visit(SuffixTree.IVisitor visitor)
Walks the states and edges of the suffix tree, depth-first.
-
visitState
public final void visitState(int state, SuffixTree.IVisitor visitor)
Start visiting from a given state.
-
getRootState
public int getRootState()
For procedural traversals (not visitors).
-
isLeaf
public final boolean isLeaf(int state)
Check ifstate
is a leaf (has no outgoing edges).
-
firstEdge
public final int firstEdge(int state)
Returns the index of the first edge from a given state orNO_EDGE
if a given state has no edges. Does not perform any sanity check on the input state.
-
nextEdge
public final int nextEdge(int edge)
Returns the index of the next edge (sibling) orNO_EDGE
ifedge
is the last edge in its state.
-
findEdge
public final int findEdge(int state, int symbol)
Find a transition from statestate
, labeled with a given symbol.NO_EDGE
is returned if there is no such edge.
-
getToState
public int getToState(int edge)
Returns the target state for a given edge.
-
getStartIndex
public int getStartIndex(int edge)
Returns the edge label's start index (inclusive).
-
getEndIndex
public int getEndIndex(int edge)
Returns the edge label's end index (inclusive).
-
-