Package org.biojava.bio.seq.io
Class WordTokenization
java.lang.Object
org.biojava.utils.Unchangeable
org.biojava.bio.seq.io.WordTokenization
- All Implemented Interfaces:
Serializable,Annotatable,SymbolTokenization,Changeable
- Direct Known Subclasses:
CrossProductTokenization,DoubleTokenization,IntegerTokenization,NameTokenization,SubIntegerTokenization
public abstract class WordTokenization
extends Unchangeable
implements SymbolTokenization, Serializable
Base class for tokenizations which accept whitespace-separated
`words'. Splits at whitespace, except when it is quoted by
either double-quotes ("), brackets (), or square brackets [].
- Since:
- 1.2
- Author:
- Thomas Down, Greg Cox, Keith James
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.biojava.bio.Annotatable
Annotatable.AnnotationForwarderNested classes/interfaces inherited from interface org.biojava.bio.seq.io.SymbolTokenization
SymbolTokenization.TokenType -
Field Summary
Fields inherited from interface org.biojava.bio.Annotatable
ANNOTATIONFields inherited from interface org.biojava.bio.seq.io.SymbolTokenization
CHARACTER, FIXEDWIDTH, SEPARATED, UNKNOWN -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionThe alphabet to which this tokenization applies.Should return the associated annotation object.Determine the style of tokenization represented by this object.parseStream(SeqIOListener siol) Return an object which can parse an arbitrary character stream into symbols.protected Symbol[]protected ListsplitString(String str) Return a string representation of a list of symbols.Methods inherited from class org.biojava.utils.Unchangeable
addChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarderMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.biojava.utils.Changeable
addChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListenerMethods inherited from interface org.biojava.bio.seq.io.SymbolTokenization
parseToken, tokenizeSymbol
-
Constructor Details
-
WordTokenization
-
-
Method Details
-
getAlphabet
Description copied from interface:SymbolTokenizationThe alphabet to which this tokenization applies.- Specified by:
getAlphabetin interfaceSymbolTokenization
-
getTokenType
Description copied from interface:SymbolTokenizationDetermine the style of tokenization represented by this object.- Specified by:
getTokenTypein interfaceSymbolTokenization
-
getAnnotation
Description copied from interface:AnnotatableShould return the associated annotation object.- Specified by:
getAnnotationin interfaceAnnotatable- Returns:
- an Annotation object, never null
-
tokenizeSymbolList
public String tokenizeSymbolList(SymbolList sl) throws IllegalSymbolException, IllegalAlphabetException Description copied from interface:SymbolTokenizationReturn a string representation of a list of symbols.- Specified by:
tokenizeSymbolListin interfaceSymbolTokenization- Parameters:
sl- A SymbolList- Throws:
IllegalAlphabetException- if alphabets don't matchIllegalSymbolException
-
parseStream
Description copied from interface:SymbolTokenizationReturn an object which can parse an arbitrary character stream into symbols.- Specified by:
parseStreamin interfaceSymbolTokenization- Parameters:
siol- The listener which gets notified of parsed symbols.
-
splitString
- Throws:
IllegalSymbolException
-
parseString
- Throws:
IllegalSymbolException
-