Class ArrayDigest


public class ArrayDigest extends AbstractTDigest
Array based implementation of a TDigest.

This implementation is essentially a one-level b-tree in which nodes are collected into pages typically with 32 values per page. Commonly, an ArrayDigest contains 500-3000 centroids. With 32 values per page, we have about 32 values per page and about 30 pages which seems to give a nice balance for speed. Sizes from 4 to 100 are plausible, however.

  • Field Details

  • Constructor Details

    • ArrayDigest

      public ArrayDigest(int pageSize, double compression)
  • Method Details

    • add

      public void add(double x, int w)
      Description copied from class: TDigest
      Adds a sample to a histogram.
      Specified by:
      add in class TDigest
      Parameters:
      x - The value to add.
      w - The weight of this point.
    • headSum

      public long headSum(com.tdunning.math.stats.ArrayDigest.Index limit)
    • mean

      public double mean(com.tdunning.math.stats.ArrayDigest.Index index)
    • count

      public int count(com.tdunning.math.stats.ArrayDigest.Index index)
    • compress

      public void compress()
      Description copied from class: TDigest
      Re-examines a t-digest to determine whether some centroids are redundant. If your data are perversely ordered, this may be a good idea. Even if not, this may save 20% or so in space.

      The cost is roughly the same as adding as many data points as there are centroids. This is typically < 10 * compression, but could be as high as 100 * compression.

      This is a destructive operation that is not thread-safe.

      Specified by:
      compress in class TDigest
    • compress

      public void compress(GroupTree other)
      Specified by:
      compress in class AbstractTDigest
    • size

      public long size()
      Description copied from class: TDigest
      Returns the number of points that have been added to this TDigest.
      Specified by:
      size in class TDigest
      Returns:
      The sum of the weights on all centroids.
    • cdf

      public double cdf(double x)
      Description copied from class: TDigest
      Returns the fraction of all points added which are <= x.
      Specified by:
      cdf in class TDigest
    • quantile

      public double quantile(double q)
      Description copied from class: TDigest
      Returns an estimate of the cutoff such that a specified fraction of the data added to this TDigest would be less than or equal to the cutoff.
      Specified by:
      quantile in class TDigest
      Parameters:
      q - The desired fraction
      Returns:
      The value x such that cdf(x) == q
    • centroidCount

      public int centroidCount()
      Description copied from class: TDigest
      The number of centroids currently in the TDigest.
      Specified by:
      centroidCount in class TDigest
      Returns:
      The number of centroids
    • centroids

      public Iterable<? extends Centroid> centroids()
      Description copied from class: TDigest
      An iterable that lets you go through the centroids in ascending order by mean. Centroids returned will not be re-used, but may or may not share storage with this TDigest.
      Specified by:
      centroids in class TDigest
      Returns:
      The centroids in the form of an Iterable.
    • allAfter

      public Iterator<com.tdunning.math.stats.ArrayDigest.Index> allAfter(double x)
    • floor

      public com.tdunning.math.stats.ArrayDigest.Index floor(double x)
      Returns a cursor pointing to the first element <= x. Exposed only for testing.
      Parameters:
      x - The value used to find the cursor.
      Returns:
      The cursor.
    • ceiling

      public com.tdunning.math.stats.ArrayDigest.Index ceiling(double x)
    • allBefore

      public Iterator<com.tdunning.math.stats.ArrayDigest.Index> allBefore(double x)
      Returns an iterator which will give each element <= to x in non-increasing order.
      Parameters:
      x - The upper bound of all returned elements
      Returns:
      An iterator that returns elements in non-increasing order.
    • increment

      public com.tdunning.math.stats.ArrayDigest.Index increment(com.tdunning.math.stats.ArrayDigest.Index x, int delta)
    • compression

      public double compression()
      Description copied from class: TDigest
      Returns the current compression factor.
      Specified by:
      compression in class TDigest
      Returns:
      The compression factor originally used to set up the TDigest.
    • byteSize

      public int byteSize()
      Returns an upper bound on the number bytes that will be required to represent this histogram.
      Specified by:
      byteSize in class TDigest
      Returns:
      The number of bytes required.
    • smallByteSize

      public int smallByteSize()
      Returns an upper bound on the number of bytes that will be required to represent this histogram in the tighter representation.
      Specified by:
      smallByteSize in class TDigest
      Returns:
      The number of bytes required.
    • asBytes

      public void asBytes(ByteBuffer buf)
      Outputs a histogram as bytes using a particularly cheesy encoding.
      Specified by:
      asBytes in class TDigest
      Parameters:
      buf - The byte buffer into which the TDigest should be serialized.
    • asSmallBytes

      public void asSmallBytes(ByteBuffer buf)
      Description copied from class: TDigest
      Serialize this TDigest into a byte buffer. Some simple compression is used such as using variable byte representation to store the centroid weights and using delta-encoding on the centroid means so that floats can be reasonably used to store the centroid means.
      Specified by:
      asSmallBytes in class TDigest
      Parameters:
      buf - The byte buffer into which the TDigest should be serialized.
    • fromBytes

      public static ArrayDigest fromBytes(ByteBuffer buf)
      Reads a histogram from a byte buffer
      Returns:
      The new histogram structure