Class AVLTreeDigest


public class AVLTreeDigest extends AbstractTDigest
  • Field Details

  • Constructor Details

    • AVLTreeDigest

      public AVLTreeDigest(double compression)
      A histogram structure that will record a sketch of a distribution.
      Parameters:
      compression - How should accuracy be traded for size? A value of N here will give quantile errors almost always less than 3/N with considerably smaller errors expected for extreme quantiles. Conversely, you should expect to track about 5 N centroids for this accuracy.
  • Method Details

    • recordAllData

      public TDigest recordAllData()
      Description copied from class: AbstractTDigest
      Sets up so that all centroids will record all data assigned to them. For testing only, really.
      Overrides:
      recordAllData in class AbstractTDigest
      Returns:
      This TDigest so that configurations can be done in fluent style.
    • add

      public void add(double x, int w)
      Description copied from class: TDigest
      Adds a sample to a histogram.
      Specified by:
      add in class TDigest
      Parameters:
      x - The value to add.
      w - The weight of this point.
    • add

      public void add(double x, int w, List<Double> data)
    • compress

      public void compress()
      Description copied from class: TDigest
      Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space.

      The cost is roughly the same as adding as many data points as there are centroids. This is typically < 10 * compression, but could be as high as 100 * compression.

      This is a destructive operation that is not thread-safe.

      Specified by:
      compress in class TDigest
    • compress

      public void compress(GroupTree other)
      Specified by:
      compress in class AbstractTDigest
    • size

      public long size()
      Returns the number of samples represented in this histogram. If you want to know how many centroids are being used, try centroids().size().
      Specified by:
      size in class TDigest
      Returns:
      the number of samples that have been added.
    • cdf

      public double cdf(double x)
      Description copied from class: TDigest
      Returns the fraction of all points added which are <= x.
      Specified by:
      cdf in class TDigest
      Parameters:
      x - the value at which the CDF should be evaluated
      Returns:
      the approximate fraction of all samples that were less than or equal to x.
    • quantile

      public double quantile(double q)
      Description copied from class: TDigest
      Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
      Specified by:
      quantile in class TDigest
      Parameters:
      q - The quantile desired. Can be in the range [0,1].
      Returns:
      The minimum value x such that we think that the proportion of samples is <= x is q.
    • centroidCount

      public int centroidCount()
      Description copied from class: TDigest
      The number of centroids currently in the TDigest.
      Specified by:
      centroidCount in class TDigest
      Returns:
      The number of centroids
    • centroids

      public Iterable<? extends Centroid> centroids()
      Description copied from class: TDigest
      An iterable that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
      Specified by:
      centroids in class TDigest
      Returns:
      The centroids in the form of an Iterable.
    • compression

      public double compression()
      Description copied from class: TDigest
      Returns the current compression factor.
      Specified by:
      compression in class TDigest
      Returns:
      The compression factor originally used to set up the TDigest.
    • byteSize

      public int byteSize()
      Returns an upper bound on the number bytes that will be required to represent this histogram.
      Specified by:
      byteSize in class TDigest
      Returns:
      The number of bytes required.
    • smallByteSize

      public int smallByteSize()
      Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.
      Specified by:
      smallByteSize in class TDigest
      Returns:
      The number of bytes required.
    • asBytes

      public void asBytes(ByteBuffer buf)
      Outputs a histogram as bytes using a particularly cheesy encoding.
      Specified by:
      asBytes in class TDigest
      Parameters:
      buf - The byte buffer into which the TDigest should be serialized.
    • asSmallBytes

      public void asSmallBytes(ByteBuffer buf)
      Description copied from class: TDigest
      Serialize this TDigest into a byte buffer. Some simple compression is used such as using variable byte representation to store the centroid weights and using delta-encoding on the centroid means so that floats can be reasonably used to store the centroid means.
      Specified by:
      asSmallBytes in class TDigest
      Parameters:
      buf - The byte buffer into which the TDigest should be serialized.
    • fromBytes

      public static AVLTreeDigest fromBytes(ByteBuffer buf)
      Reads a histogram from a byte buffer
      Returns:
      The new histogram structure