Package picard.sam.markduplicates.util
Class DiskBasedReadEndsForMarkDuplicatesMap
java.lang.Object
picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap
- All Implemented Interfaces:
ReadEndsForMarkDuplicatesMap
public class DiskBasedReadEndsForMarkDuplicatesMap
extends Object
implements ReadEndsForMarkDuplicatesMap
Disk-based implementation of ReadEndsForMarkDuplicatesMap. A subdirectory of the system tmpdir is created to store
files, one for each reference sequence. The reference sequence that is currently being queried (i.e. the
sequence for which remove() has been most recently called) is stored in RAM. ReadEnds for all other sequences
are stored on disk.
When put() is called for a sequence that is the current one in RAM, the ReadEnds object is merely put into the
in-memory map. If put() is called for a sequence ID that is not the current RAM one, the ReadEnds object is
appended to the file for that sequence, creating the file if necessary.
When remove() is called for a sequence that is the current one in RAM, remove() is called on the in-memory map.
If remove() is called for a sequence other than the current RAM sequence, then the current RAM sequence is written
to disk, the new sequence is read from disk into RAM map, and the file for the new sequence is deleted.
If things work properly, and reads are processed in genomic order, records will be written for mates that are in
a later sequence. When the mate is reached in the input SAM file, the file that was written will be deleted.
This should result in all temporary files being deleted by the time all the reads are processed. The temp
directory is marked to be deleted on exit so everything should get cleaned up.
-
Constructor Summary
ConstructorsConstructorDescriptionDiskBasedReadEndsForMarkDuplicatesMap
(int maxOpenFiles, ReadEndsForMarkDuplicatesCodec readEndsForMarkDuplicatesCodec) -
Method Summary
Modifier and TypeMethodDescriptionvoid
put
(int mateSequenceIndex, String key, ReadEndsForMarkDuplicates readEnds) Store the element in the map with the given key.Remove element with given key from the map.int
size()
int
-
Constructor Details
-
DiskBasedReadEndsForMarkDuplicatesMap
public DiskBasedReadEndsForMarkDuplicatesMap(int maxOpenFiles, ReadEndsForMarkDuplicatesCodec readEndsForMarkDuplicatesCodec)
-
-
Method Details
-
remove
Description copied from interface:ReadEndsForMarkDuplicatesMap
Remove element with given key from the map. Because an implementation may be disk-based, the object returned may not be the same object that was put into the map- Specified by:
remove
in interfaceReadEndsForMarkDuplicatesMap
- Parameters:
mateSequenceIndex
- must agree with the value used when the object was put into the mapkey
- typically, concatenation of read group ID and read name- Returns:
- null if the key is not found, otherwise the object removed.
-
put
Description copied from interface:ReadEndsForMarkDuplicatesMap
Store the element in the map with the given key. It is assumed that the element is not already present in the map.- Specified by:
put
in interfaceReadEndsForMarkDuplicatesMap
- Parameters:
mateSequenceIndex
- use to optimize storage invalid input: '&' retrieval. The same value must be used when trying to remove this element. It is not valid to store the same key with two different mateSequenceIndexes.key
- typically, concatenation of read group ID and read namereadEnds
- the object to be stored
-
size
public int size()- Specified by:
size
in interfaceReadEndsForMarkDuplicatesMap
- Returns:
- number of elements stored in map
-
sizeInRam
public int sizeInRam()- Specified by:
sizeInRam
in interfaceReadEndsForMarkDuplicatesMap
- Returns:
- number of elements stored in RAM. Always invalid input: '<'= size()
-