Class CollectAlignmentSummaryMetrics


@DocumentedFeature public class CollectAlignmentSummaryMetrics extends SinglePassSamProgram
A command line tool to read a BAM file and produce standard alignment metrics that would be applicable to any alignment. Metrics to include, but not limited to:
  • Total number of reads (total, period, no exclusions)
  • Total number of PF reads (PF == does not fail vendor check flag)
  • Number of PF noise reads (does not fail vendor check and has noise attr set)
  • Total aligned PF reads (any PF read that has a sequence and position)
  • High quality aligned PF reads (high quality == mapping quality >= 20)
  • High quality aligned PF bases (actual aligned bases, calculate off alignment blocks)
  • High quality aligned PF Q20 bases (subset of above where base quality >= 20)
  • Median mismatches in HQ aligned PF reads (how many aligned bases != ref on average)
  • Reads aligned in pairs (vs. reads aligned with mate unaligned/not present)
  • Read length (how to handle mixed lengths?)
  • Bad Cycles - how many machine cycles yielded combined no-call and mismatch rates of >= 80%
  • Strand balance - reads mapped to positive strand / total mapped reads
Metrics are written for the first read of a pair, the second read, and combined for the pair. Chimeras are identified if any of the of following criteria are met:
  • the insert size is larger than MAX_INSERT_SIZE
  • the ends of a pair map to different contigs
  • the paired end orientation is different that the expected orientation
  • the read contains an SA tag (chimeric alignment)
  • Field Details

    • HISTOGRAM_FILE

      @Argument(shortName="H", doc="If Provided, file to write read-length chart pdf.", optional=true) public File HISTOGRAM_FILE
    • MAX_INSERT_SIZE

      @Argument(doc="Paired-end reads above this insert size will be considered chimeric along with inter-chromosomal pairs.") public int MAX_INSERT_SIZE
    • EXPECTED_PAIR_ORIENTATIONS

      @Argument(doc="Paired-end reads that do not have this expected orientation will be considered chimeric.") public Set<htsjdk.samtools.SamPairUtil.PairOrientation> EXPECTED_PAIR_ORIENTATIONS
    • ADAPTER_SEQUENCE

      @Argument(doc="List of adapter sequences to use when processing the alignment metrics.") public List<String> ADAPTER_SEQUENCE
    • METRIC_ACCUMULATION_LEVEL

      @Argument(shortName="LEVEL", doc="The level(s) at which to accumulate metrics.") public Set<MetricAccumulationLevel> METRIC_ACCUMULATION_LEVEL
    • IS_BISULFITE_SEQUENCED

      @Argument(shortName="BS", doc="Whether the SAM or BAM file consists of bisulfite sequenced reads.") public boolean IS_BISULFITE_SEQUENCED
    • COLLECT_ALIGNMENT_INFORMATION

      @Argument(doc="A flag to disable the collection of actual alignment information. If false, tool will only count READS, PF_READS, and NOISE_READS. (For backwards compatibility).") public boolean COLLECT_ALIGNMENT_INFORMATION
  • Constructor Details

    • CollectAlignmentSummaryMetrics

      public CollectAlignmentSummaryMetrics()
  • Method Details

    • customCommandLineValidation

      protected String[] customCommandLineValidation()
      Description copied from class: CommandLineProgram
      Put any custom command-line validation in an override of this method. clp is initialized at this point and can be used to print usage and access argv. Any options set by command-line parser can be validated.
      Overrides:
      customCommandLineValidation in class CommandLineProgram
      Returns:
      null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
    • setup

      protected void setup(htsjdk.samtools.SAMFileHeader header, File samFile)
      Description copied from class: SinglePassSamProgram
      Should be implemented by subclasses to do one-time initialization work.
      Specified by:
      setup in class SinglePassSamProgram
    • acceptRead

      protected void acceptRead(htsjdk.samtools.SAMRecord rec, htsjdk.samtools.reference.ReferenceSequence ref)
      Description copied from class: SinglePassSamProgram
      Should be implemented by subclasses to accept SAMRecords one at a time. If the read has a reference sequence and a reference sequence file was supplied to the program it will be passed as 'ref'. Otherwise 'ref' may be null.
      Specified by:
      acceptRead in class SinglePassSamProgram
    • finish

      protected void finish()
      Description copied from class: SinglePassSamProgram
      Should be implemented by subclasses to do one-time finalization work.
      Specified by:
      finish in class SinglePassSamProgram
    • makeReferenceArgumentCollection

      protected ReferenceArgumentCollection makeReferenceArgumentCollection()
      Overrides:
      makeReferenceArgumentCollection in class CommandLineProgram