Class AverageLinkage

java.lang.Object
ch.usi.inf.sape.hac.agglomeration.AverageLinkage
All Implemented Interfaces:
AgglomerationMethod

public final class AverageLinkage extends Object implements AgglomerationMethod
The "average", "group average", "unweighted average", or "Unweighted Pair Group Method using Arithmetic averages (UPGMA)", is a graph-based approach. The distance between two clusters is calculated as the average of the distances between all pairs of objects in opposite clusters. This method tends to produce small clusters of outliers, but does not deform the cluster space. [The data analysis handbook. By Ildiko E. Frank, Roberto Todeschini] The general form of the Lance-Williams matrix-update formula: d[(i,j),k] = ai*d[i,k] + aj*d[j,k] + b*d[i,j] + g*|d[i,k]-d[j,k]| For the "group average" method: ai = ci/(ci+cj) aj = cj/(ci+cj) b = 0 g = 0 Thus: d[(i,j),k] = ci/(ci+cj)*d[i,k] + cj/(ci+cj)*d[j,k] = ( ci*d[i,k] + cj*d[j,k] ) / (ci+cj)
  • Constructor Details

    • AverageLinkage

      public AverageLinkage()
  • Method Details

    • computeDissimilarity

      public double computeDissimilarity(double dik, double djk, double dij, int ci, int cj, int ck)
      Description copied from interface: AgglomerationMethod
      Compute the dissimilarity between the newly formed cluster (i,j) and the existing cluster k.
      Specified by:
      computeDissimilarity in interface AgglomerationMethod
      Parameters:
      dik - dissimilarity between clusters i and k
      djk - dissimilarity between clusters j and k
      dij - dissimilarity between clusters i and j
      ci - cardinality of cluster i
      cj - cardinality of cluster j
      ck - cardinality of cluster k
      Returns:
      dissimilarity between cluster (i,j) and cluster k.
    • toString

      public String toString()
      Overrides:
      toString in class Object