Package picard.illumina
Class SortedBasecallsConverter<CLUSTER_OUTPUT_RECORD>
java.lang.Object
picard.illumina.BasecallsConverter<CLUSTER_OUTPUT_RECORD>
picard.illumina.SortedBasecallsConverter<CLUSTER_OUTPUT_RECORD>
public class SortedBasecallsConverter<CLUSTER_OUTPUT_RECORD>
extends BasecallsConverter<CLUSTER_OUTPUT_RECORD>
SortedBasecallsConverter utilizes an underlying IlluminaDataProvider to convert parsed and decoded sequencing data
from standard Illumina formats to specific output records (FASTA records/SAM records). This data is processed
on a tile by tile basis and sorted based on a output record comparator.
The underlying IlluminaDataProvider apply several optional transformations that can include EAMSS filtering, non-PF read filtering and quality score recoding using a BclQualityEvaluationStrategy.
The converter can also limit the scope of data that is converted from the data provider by setting the tile to start on (firstTile) and the total number of tiles to process (tileLimit).
Additionally, BasecallsConverter can optionally demultiplex reads by outputting barcode specific reads to their associated writers.
-
Nested Class Summary
Nested classes/interfaces inherited from class picard.illumina.BasecallsConverter
BasecallsConverter.ClusterDataConverter<OUTPUT_RECORD>, BasecallsConverter.ConvertedClusterDataWriter<OUTPUT_RECORD>
-
Field Summary
FieldsFields inherited from class picard.illumina.BasecallsConverter
barcodeExtractor, barcodeRecordWriterMap, converter, DATA_TYPES_WITH_BARCODE, DATA_TYPES_WITHOUT_BARCODE, demultiplex, ignoreUnexpectedBarcodes, includeNonPfReads, laneFactories, TILE_NUMBER_COMPARATOR, tiles, writerPool
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
SortedBasecallsConverter
(File basecallsDir, File barcodesDir, int[] lanes, ReadStructure readStructure, Map<String, ? extends htsjdk.io.Writer<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap, boolean demultiplex, int maxReadsInRamPerTile, List<File> tmpDirs, int numThreads, Integer firstTile, Integer tileLimit, Comparator<CLUSTER_OUTPUT_RECORD> outputRecordComparator, htsjdk.samtools.util.SortingCollection.Codec<CLUSTER_OUTPUT_RECORD> codecPrototype, Class<CLUSTER_OUTPUT_RECORD> outputRecordClass, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, boolean ignoreUnexpectedBarcodes, boolean applyEamssFiltering, boolean includeNonPfReads, htsjdk.io.AsyncWriterPool writerPool, BarcodeExtractor barcodeExtractor) Constructs a new SortedBaseCallsConverter. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
void
processTilesAndWritePerSampleOutputs
(Set<String> barcodes) Set up tile processing and record writing threads for this converter.Methods inherited from class picard.illumina.BasecallsConverter
closeWriters, getDataTypesFromReadStructure, getLaneFactories, getTiledFiles, interruptAndShutdownExecutors, maybeDemultiplex, setConverter, setTileLimits, updateMetrics
-
Field Details
-
log
protected static final htsjdk.samtools.util.Log log
-
-
Constructor Details
-
SortedBasecallsConverter
protected SortedBasecallsConverter(File basecallsDir, File barcodesDir, int[] lanes, ReadStructure readStructure, Map<String, ? extends htsjdk.io.Writer<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap, boolean demultiplex, int maxReadsInRamPerTile, List<File> tmpDirs, int numThreads, Integer firstTile, Integer tileLimit, Comparator<CLUSTER_OUTPUT_RECORD> outputRecordComparator, htsjdk.samtools.util.SortingCollection.Codec<CLUSTER_OUTPUT_RECORD> codecPrototype, Class<CLUSTER_OUTPUT_RECORD> outputRecordClass, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, boolean ignoreUnexpectedBarcodes, boolean applyEamssFiltering, boolean includeNonPfReads, htsjdk.io.AsyncWriterPool writerPool, BarcodeExtractor barcodeExtractor) Constructs a new SortedBaseCallsConverter.- Parameters:
basecallsDir
- Where to read basecalls from.barcodesDir
- Where to read barcodes from (optional; use basecallsDir if not specified).lanes
- What lanes to process.readStructure
- How to interpret each cluster.barcodeRecordWriterMap
- Map from barcode to CLUSTER_OUTPUT_RECORD writer. If demultiplex is false, must contain one writer stored with key=null.demultiplex
- If true, output is split by barcode, otherwise all are written to the same output stream.maxReadsInRamPerTile
- Configures number of reads each tile will store in RAM before spilling to disk.tmpDirs
- For SortingCollection spilling.numThreads
- Controls number of threads.firstTile
- (For debugging) If non-null, start processing at this tile.tileLimit
- (For debugging) If non-null, process no more than this many tiles.outputRecordComparator
- For sorting output records within a single tile.codecPrototype
- For spilling output records to disk.outputRecordClass
- Class needed to create SortingCollections.bclQualityEvaluationStrategy
- The basecall quality evaluation strategy that is applyed to decoded base calls.ignoreUnexpectedBarcodes
- If true, will ignore reads whose called barcode is not found in barcodeRecordWriterMap.applyEamssFiltering
- If true, apply EAMSS filtering if parsing BCLs for bases and quality scores.includeNonPfReads
- If true, will include ALL reads (including those which do not have PF set). This option does nothing for instruments that output cbcls (Novaseqs)
-
-
Method Details
-
processTilesAndWritePerSampleOutputs
Set up tile processing and record writing threads for this converter. This creates a tile processing thread pool of size `numThreads`. The tile processing threads notify the completed work checking thread when they are done processing a thread. The completed work checking thread will then dispatch the record writing for tiles in order.- Specified by:
processTilesAndWritePerSampleOutputs
in classBasecallsConverter<CLUSTER_OUTPUT_RECORD>
- Parameters:
barcodes
- The barcodes used for demultiplexing. When there is no demultiplexing done this should be a Set containing a single null value.- Throws:
IOException
-
awaitTileProcessingCompletion
- Throws:
IOException
-