Class EmblCDROMIndexReader
- Direct Known Subclasses:
AcnumHitReader,AcnumTrgReader,DivisionLkpReader,EntryNamIdxReader
EmblCDROMIndexReader is an abstract class whose
concrete subclasses read EMBL CD-ROM format indices from an
underlying InputStream. This format is used by the
EMBOSS package for database indexing (see programs dbiblast,
dbifasta, dbiflat and dbigcg). Indexing produces four binary files
with a simple format:
- division.lkp : master index
- entrynam.idx : sequence ID index
- acnum.trg : accession number index
- acnum.hit : accession number auxiliary index
Internally EMBOSS checks for Big-endian architechtures and
switches the byte order to Little-endian. This means trouble if you
try to read the file using DataInputStream, but at
least the binaries are consistent across architechtures. This class
carries out the necessary conversion.
The EMBL CD-ROM format stores the date in 4 bytes. One byte is unused (the first one), leaving one byte for the day, one for the month and one (!) for the year.
For further information see the EMBOSS documentation, or for a full description, the source code of the dbi programs and the Ajax library.
- Since:
- 1.2
- Author:
- Keith James
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected InputStreamprotected org.biojava.bio.seq.db.emblcd.RecordParserprotected StringBuffer -
Constructor Summary
ConstructorsConstructorDescriptionEmblCDROMIndexReader(InputStream input) Creates a newEmblCDROMIndexReaderinstance. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()closecloses the underlyingInputStream.readDBDatereads the date from the index header.readDBNamereturns the database name from the index header.readDBReleasereturns the database release from the index header.longreadFileLengthreturns the file length in bytes (stored within the file's header by the indexing program).byte[]readRawRecordreturns the raw bytes of a single record from the index.abstract Object[]readRecordreturns an array of objects parsed from a single record.longreadRecordCountreturns the number of records in the file.intreadRecordLengthreturns the record length (bytes).
-
Field Details
-
input
-
sb
-
recParser
-
-
Constructor Details
-
EmblCDROMIndexReader
Creates a newEmblCDROMIndexReaderinstance. ABufferedInputStreamis probably the most suitable.- Parameters:
input- anInputStream.- Throws:
IOException- if an error occurs.
-
-
Method Details
-
readFileLength
readFileLengthreturns the file length in bytes (stored within the file's header by the indexing program). This may be called more than once as the value is cached.- Returns:
- a
long.
-
readRecordCount
readRecordCountreturns the number of records in the file. This may be called more than once as the value is cached.- Returns:
- a
long.
-
readRecordLength
readRecordLengthreturns the record length (bytes). This may be called more than once as the value is cached.- Returns:
- an
int.
-
readDBName
readDBNamereturns the database name from the index header. This may be called more than once as the value is cached.- Returns:
- a
String.
-
readDBRelease
readDBReleasereturns the database release from the index header. This may be called more than once as the value is cached.- Returns:
- a
String.
-
readDBDate
readDBDatereads the date from the index header. The date is stored in 4 bytes: 0, unused; 1, year; 2, month; 3, day. With a 1 byte year it's not very much use and I'm not sure that the EMBOSS programs set the value correctly anyway.- Returns:
- a
String.
-
readRecord
readRecordreturns an array of objects parsed from a single record. Its content will depend on the type of index file. Concrete subclasses must provide an implementation of this method.- Returns:
- an
Object []array. - Throws:
IOException- if an error occurs.
-
readRawRecord
readRawRecordreturns the raw bytes of a single record from the index.- Returns:
- a
byte []array. - Throws:
IOException- if an error occurs.
-
close
closecloses the underlyingInputStream.- Throws:
IOException- if an error occurs.
-