Package picard.sam

Class DuplicationMetrics

java.lang.Object
htsjdk.samtools.metrics.MetricBase
picard.analysis.MergeableMetricBase
picard.sam.DuplicationMetrics
Direct Known Subclasses:
FlowBasedDuplicationMetrics

@DocumentedFeature(groupName="Metrics", summary="Metrics") public class DuplicationMetrics extends MergeableMetricBase
Metrics that are calculated during the process of marking duplicates within a stream of SAMRecords.
  • Field Details

    • LIBRARY

      public String LIBRARY
      The library on which the duplicate marking was performed.
    • UNPAIRED_READS_EXAMINED

      public long UNPAIRED_READS_EXAMINED
      The number of mapped reads examined which did not have a mapped mate pair, either because the read is unpaired, or the read is paired to an unmapped mate.
    • READ_PAIRS_EXAMINED

      public long READ_PAIRS_EXAMINED
      The number of mapped read pairs examined. (Primary, non-supplemental)
    • SECONDARY_OR_SUPPLEMENTARY_RDS

      public long SECONDARY_OR_SUPPLEMENTARY_RDS
      The number of reads that were either secondary or supplementary
    • UNMAPPED_READS

      public long UNMAPPED_READS
      The total number of unmapped reads examined. (Primary, non-supplemental)
    • UNPAIRED_READ_DUPLICATES

      public long UNPAIRED_READ_DUPLICATES
      The number of fragments that were marked as duplicates.
    • READ_PAIR_DUPLICATES

      public long READ_PAIR_DUPLICATES
      The number of read pairs that were marked as duplicates.
    • READ_PAIR_OPTICAL_DUPLICATES

      public long READ_PAIR_OPTICAL_DUPLICATES
      The number of read pairs duplicates that were caused by optical duplication. Value is always invalid input: '<' READ_PAIR_DUPLICATES, which counts all duplicates regardless of source.
    • PERCENT_DUPLICATION

      public Double PERCENT_DUPLICATION
      The fraction of mapped sequence that is marked as duplicate.
    • ESTIMATED_LIBRARY_SIZE

      public Long ESTIMATED_LIBRARY_SIZE
      The estimated number of unique molecules in the library based on PE duplication.
  • Constructor Details

    • DuplicationMetrics

      public DuplicationMetrics()
  • Method Details

    • calculateDerivedFields

      public void calculateDerivedFields()
      Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.
      Overrides:
      calculateDerivedFields in class MergeableMetricBase
    • calculateDerivedMetrics

      @Deprecated public void calculateDerivedMetrics()
      Deprecated.
      Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.

      Deprecated, use calculateDerivedFields() instead.

    • estimateLibrarySize

      public static Long estimateLibrarySize(long readPairs, long uniqueReadPairs)
      Estimates the size of a library based on the number of paired end molecules observed and the number of unique pairs observed.

      Based on the Lander-Waterman equation that states: C/X = 1 - exp( -N/X ) where X = number of distinct molecules in library N = number of read pairs C = number of distinct fragments observed in read pairs

    • estimateRoi

      public static double estimateRoi(long estimatedLibrarySize, double x, long pairs, long uniquePairs)
      Estimates the ROI (return on investment) that one would see if a library was sequenced to x higher coverage than the observed coverage.
      Parameters:
      estimatedLibrarySize - the estimated number of molecules in the library
      x - the multiple of sequencing to be simulated (i.e. how many X sequencing)
      pairs - the number of pairs observed in the actual sequencing
      uniquePairs - the number of unique pairs observed in the actual sequencing
      Returns:
      a number z invalid input: '<'= x that estimates if you had pairs*x as your sequencing then you would observe uniquePairs*z unique pairs.
    • calculateRoiHistogram

      public htsjdk.samtools.util.Histogram<Double> calculateRoiHistogram()
      Calculates a histogram using the estimateRoi method to estimate the effective yield doing x sequencing for x=1..10.
    • main

      public static void main(String[] args)
    • addDuplicateReadToMetrics

      public void addDuplicateReadToMetrics(htsjdk.samtools.SAMRecord rec)
      Adds duplicated read to the metrics
    • addReadToLibraryMetrics

      public void addReadToLibraryMetrics(htsjdk.samtools.SAMRecord rec)
      Adds a read to the metrics