Class ExtractBarcodesProgram

java.lang.Object
picard.cmdline.CommandLineProgram
picard.illumina.ExtractBarcodesProgram
Direct Known Subclasses:
ExtractIlluminaBarcodes, IlluminaBasecallsToFastq, IlluminaBasecallsToSam

public abstract class ExtractBarcodesProgram extends CommandLineProgram
  • Field Details

    • DISTANCE_MODE

      @Argument(doc="The distance metric that should be used to compare the barcode-reads and the provided barcodes for finding the best and second-best assignments.") public DistanceMetric DISTANCE_MODE
    • MAX_MISMATCHES

      @Argument(doc="Maximum mismatches for a barcode to be considered a match.") public int MAX_MISMATCHES
    • MIN_MISMATCH_DELTA

      @Argument(doc="Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match.") public int MIN_MISMATCH_DELTA
    • MAX_NO_CALLS

      @Argument(doc="Maximum allowable number of no-calls in a barcode read before it is considered unmatchable.") public int MAX_NO_CALLS
    • MINIMUM_BASE_QUALITY

      @Argument(shortName="Q", doc="Minimum base quality. Any barcode bases falling below this quality will be considered a mismatch even if the bases match.") public int MINIMUM_BASE_QUALITY
    • MINIMUM_QUALITY

      @Argument(doc="The minimum quality (after transforming 0s to 1s) expected from reads. If qualities are lower than this value, an error is thrown. The default of 2 is what the Illumina\'s spec describes as the minimum, but in practice the value has been observed lower.") public int MINIMUM_QUALITY
    • LANE

      @Argument(doc="Lane number. This can be specified multiple times. Reads with the same index in multiple lanes will be added to the same output file.", shortName="L") public List<Integer> LANE
    • READ_STRUCTURE

      @Argument(doc="A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Sample Barcode, M for molecular barcode, T for Template, and S for skip). E.g. If the input data consists of 80 base clusters and we provide a read structure of \"28T8M8B8S28T\" then the sequence may be split up into four reads:\n* read one with 28 cycles (bases) of template\n* read two with 8 cycles (bases) of molecular barcode (ex. unique molecular barcode)\n* read three with 8 cycles (bases) of sample barcode\n* 8 cycles (bases) skipped.\n* read four with 28 cycles (bases) of template\nThe skipped cycles would NOT be included in an output SAM/BAM file or in read groups therein.", shortName="RS") public String READ_STRUCTURE
    • COMPRESS_OUTPUTS

      @Argument(shortName="GZIP", doc="Compress output FASTQ files using gzip and append a .gz extension to the file names.") public boolean COMPRESS_OUTPUTS
    • BASECALLS_DIR

      @Argument(doc="The Illumina basecalls directory. ", shortName="B") public File BASECALLS_DIR
    • METRICS_FILE

      @Argument(doc="Per-barcode and per-lane metrics written to this file.", shortName="M", optional=true) public File METRICS_FILE
    • INPUT_PARAMS_FILE

      @Argument(doc="The input file that defines parameters for the program. This is the BARCODE_FILE for `ExtractIlluminaBarcodes` or the MULTIPLEX_PARAMS or LIBRARY_PARAMS file for `IlluminaBasecallsToFastq` or `IlluminaBasecallsToSam`", optional=true) public File INPUT_PARAMS_FILE
    • BARCODE_COLUMN

      public static final String BARCODE_COLUMN
      Column header for the first barcode sequence (preferred).
      See Also:
    • BARCODE_SEQUENCE_COLUMN

      public static final String BARCODE_SEQUENCE_COLUMN
      See Also:
    • BARCODE_NAME_COLUMN

      public static final String BARCODE_NAME_COLUMN
      Column header for the barcode name.
      See Also:
    • LIBRARY_NAME_COLUMN

      public static final String LIBRARY_NAME_COLUMN
      Column header for the library name.
      See Also:
    • BARCODE_PREFIXES

      public static final Set<String> BARCODE_PREFIXES
    • barcodeToMetrics

      protected Map<String,BarcodeMetric> barcodeToMetrics
    • bclQualityEvaluationStrategy

      protected final BclQualityEvaluationStrategy bclQualityEvaluationStrategy
    • noMatchMetric

      protected BarcodeMetric noMatchMetric
    • inputReadStructure

      protected ReadStructure inputReadStructure
      The read structure of the actual Illumina Run, i.e. the readStructure of the input data
  • Constructor Details

    • ExtractBarcodesProgram

      public ExtractBarcodesProgram()
  • Method Details

    • createBarcodeExtractor

      protected BarcodeExtractor createBarcodeExtractor()
    • customCommandLineValidation

      protected String[] customCommandLineValidation()
      Parses all barcodes from input files and validates all barcodes are the same length and unique
      Overrides:
      customCommandLineValidation in class CommandLineProgram
      Returns:
      null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
    • collectErrorMessages

      protected String[] collectErrorMessages(List<String> messages, String[] superErrors)
    • outputMetrics

      protected void outputMetrics()
    • finalizeMetrics

      public static void finalizeMetrics(Map<String,BarcodeMetric> barcodeToMetrics, BarcodeMetric noMatchMetric)
    • parseInputFile

      protected static htsjdk.samtools.util.Tuple<Map<String,BarcodeMetric>,List<String>> parseInputFile(File inputFile, ReadStructure readStructure)
      Parses any one of the following types of files: ExtractIlluminaBarcodes BARCODE_FILE IlluminaBasecallsToFastq MULTIPLEX_PARAMS IlluminaBasecallsToSam LIBRARY_PARAMS This will validate to file format as well as populate a Map of barcodes to metrics.
      Parameters:
      inputFile - The input file that is being parsed
      readStructure - The read structure for the reads of the run