Class ThresholdSelector

All Implemented Interfaces:
Serializable, Cloneable, CapabilitiesHandler, Drawable, OptionHandler, Randomizable, RevisionHandler

public class ThresholdSelector extends RandomizableSingleClassifierEnhancer implements OptionHandler, Drawable
A metaclassifier that selecting a mid-point threshold on the probability output by a Classifier. The midpoint threshold is set so that a given performance measure is optimized. Currently this is the F-measure. Performance is measured either on the training data, a hold-out set or using cross-validation. In addition, the probabilities returned by the base learner can have their range expanded so that the output probabilities will reside between 0 and 1 (this is useful if the scheme normally produces probabilities in a very narrow range).

Valid options are:

 -C <integer>
  The class for which threshold is determined. Valid values are:
  1, 2 (for first and second classes, respectively), 3 (for whichever
  class is least frequent), and 4 (for whichever class value is most
  frequent), and 5 (for the first class named any of "yes","pos(itive)"
  "1", or method 3 if no matches). (default 5).
 -X <number of folds>
  Number of folds used for cross validation. If just a
  hold-out set is used, this determines the size of the hold-out set
  (default 3).
 -R <integer>
  Sets whether confidence range correction is applied. This
  can be used to ensure the confidences range from 0 to 1.
  Use 0 for no range correction, 1 for correction based on
  the min/max values seen during threshold selection
  (default 0).
 -E <integer>
  Sets the evaluation mode. Use 0 for
  evaluation using cross-validation,
  1 for evaluation using hold-out set,
  and 2 for evaluation on the
  training data (default 1).
 -M [FMEASURE|ACCURACY|TRUE_POS|TRUE_NEG|TP_RATE|PRECISION|RECALL]
  Measure used for evaluation (default is FMEASURE).
 
 -manual <real>
  Set a manual threshold to use. This option overrides
  automatic selection and options pertaining to
  automatic selection will be ignored.
  (default -1, i.e. do not use a manual threshold).
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.functions.Logistic)
 
 Options specific to classifier weka.classifiers.functions.Logistic:
 
 -D
  Turn on debugging output.
 -R <ridge>
  Set the ridge in the log-likelihood.
 -M <number>
  Set the maximum number of iterations (default -1, until convergence).
Options after -- are passed to the designated sub-classifier.

Version:
$Revision: 1.43 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz)
See Also:
  • Field Details

    • RANGE_NONE

      public static final int RANGE_NONE
      no range correction
      See Also:
    • RANGE_BOUNDS

      public static final int RANGE_BOUNDS
      Correct based on min/max observed
      See Also:
    • TAGS_RANGE

      public static final Tag[] TAGS_RANGE
      Type of correction applied to threshold range
    • EVAL_TRAINING_SET

      public static final int EVAL_TRAINING_SET
      entire training set
      See Also:
    • EVAL_TUNED_SPLIT

      public static final int EVAL_TUNED_SPLIT
      single tuned fold
      See Also:
    • EVAL_CROSS_VALIDATION

      public static final int EVAL_CROSS_VALIDATION
      n-fold cross-validation
      See Also:
    • TAGS_EVAL

      public static final Tag[] TAGS_EVAL
      The evaluation modes
    • OPTIMIZE_0

      public static final int OPTIMIZE_0
      first class value
      See Also:
    • OPTIMIZE_1

      public static final int OPTIMIZE_1
      second class value
      See Also:
    • OPTIMIZE_LFREQ

      public static final int OPTIMIZE_LFREQ
      least frequent class value
      See Also:
    • OPTIMIZE_MFREQ

      public static final int OPTIMIZE_MFREQ
      most frequent class value
      See Also:
    • OPTIMIZE_POS_NAME

      public static final int OPTIMIZE_POS_NAME
      class value name, either 'yes' or 'pos(itive)'
      See Also:
    • TAGS_OPTIMIZE

      public static final Tag[] TAGS_OPTIMIZE
      How to determine which class value to optimize for
    • FMEASURE

      public static final int FMEASURE
      F-measure
      See Also:
    • ACCURACY

      public static final int ACCURACY
      accuracy
      See Also:
    • TRUE_POS

      public static final int TRUE_POS
      true-positive
      See Also:
    • TRUE_NEG

      public static final int TRUE_NEG
      true-negative
      See Also:
    • TP_RATE

      public static final int TP_RATE
      true-positive rate
      See Also:
    • PRECISION

      public static final int PRECISION
      precision
      See Also:
    • RECALL

      public static final int RECALL
      recall
      See Also:
    • TAGS_MEASURE

      public static final Tag[] TAGS_MEASURE
      the measure to use
  • Constructor Details

    • ThresholdSelector

      public ThresholdSelector()
      Constructor.
  • Method Details

    • measureTipText

      public String measureTipText()
      Tooltip for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMeasure

      public void setMeasure(SelectedTag newMeasure)
      set measure used for determining threshold
      Parameters:
      newMeasure - Tag representing measure to be used
    • getMeasure

      public SelectedTag getMeasure()
      get measure used for determining threshold
      Returns:
      Tag representing measure used
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class RandomizableSingleClassifierEnhancer
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -C <integer>
        The class for which threshold is determined. Valid values are:
        1, 2 (for first and second classes, respectively), 3 (for whichever
        class is least frequent), and 4 (for whichever class value is most
        frequent), and 5 (for the first class named any of "yes","pos(itive)"
        "1", or method 3 if no matches). (default 5).
       -X <number of folds>
        Number of folds used for cross validation. If just a
        hold-out set is used, this determines the size of the hold-out set
        (default 3).
       -R <integer>
        Sets whether confidence range correction is applied. This
        can be used to ensure the confidences range from 0 to 1.
        Use 0 for no range correction, 1 for correction based on
        the min/max values seen during threshold selection
        (default 0).
       -E <integer>
        Sets the evaluation mode. Use 0 for
        evaluation using cross-validation,
        1 for evaluation using hold-out set,
        and 2 for evaluation on the
        training data (default 1).
       -M [FMEASURE|ACCURACY|TRUE_POS|TRUE_NEG|TP_RATE|PRECISION|RECALL]
        Measure used for evaluation (default is FMEASURE).
       
       -manual <real>
        Set a manual threshold to use. This option overrides
        automatic selection and options pertaining to
        automatic selection will be ignored.
        (default -1, i.e. do not use a manual threshold).
       -S <num>
        Random number seed.
        (default 1)
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
       -W
        Full name of base classifier.
        (default: weka.classifiers.functions.Logistic)
       
       Options specific to classifier weka.classifiers.functions.Logistic:
       
       -D
        Turn on debugging output.
       -R <ridge>
        Set the ridge in the log-likelihood.
       -M <number>
        Set the maximum number of iterations (default -1, until convergence).
      Options after -- are passed to the designated sub-classifier.

      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class RandomizableSingleClassifierEnhancer
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the Classifier.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class RandomizableSingleClassifierEnhancer
      Returns:
      an array of strings suitable for passing to setOptions
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class SingleClassifierEnhancer
      Returns:
      the capabilities of this classifier
      See Also:
    • buildClassifier

      public void buildClassifier(Instances instances) throws Exception
      Generates the classifier.
      Specified by:
      buildClassifier in class Classifier
      Parameters:
      instances - set of instances serving as training data
      Throws:
      Exception - if the classifier has not been generated successfully
    • distributionForInstance

      public double[] distributionForInstance(Instance instance) throws Exception
      Calculates the class membership probabilities for the given test instance.
      Overrides:
      distributionForInstance in class Classifier
      Parameters:
      instance - the instance to be classified
      Returns:
      predicted class probability distribution
      Throws:
      Exception - if instance could not be classified successfully
    • globalInfo

      public String globalInfo()
      Returns:
      a description of the classifier suitable for displaying in the explorer/experimenter gui
    • designatedClassTipText

      public String designatedClassTipText()
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getDesignatedClass

      public SelectedTag getDesignatedClass()
      Gets the method to determine which class value to optimize. Will be one of OPTIMIZE_0, OPTIMIZE_1, OPTIMIZE_LFREQ, OPTIMIZE_MFREQ, OPTIMIZE_POS_NAME.
      Returns:
      the class selection mode.
    • setDesignatedClass

      public void setDesignatedClass(SelectedTag newMethod)
      Sets the method to determine which class value to optimize. Will be one of OPTIMIZE_0, OPTIMIZE_1, OPTIMIZE_LFREQ, OPTIMIZE_MFREQ, OPTIMIZE_POS_NAME.
      Parameters:
      newMethod - the new class selection mode.
    • evaluationModeTipText

      public String evaluationModeTipText()
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setEvaluationMode

      public void setEvaluationMode(SelectedTag newMethod)
      Sets the evaluation mode used. Will be one of EVAL_TRAINING, EVAL_TUNED_SPLIT, or EVAL_CROSS_VALIDATION
      Parameters:
      newMethod - the new evaluation mode.
    • getEvaluationMode

      public SelectedTag getEvaluationMode()
      Gets the evaluation mode used. Will be one of EVAL_TRAINING, EVAL_TUNED_SPLIT, or EVAL_CROSS_VALIDATION
      Returns:
      the evaluation mode.
    • rangeCorrectionTipText

      public String rangeCorrectionTipText()
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setRangeCorrection

      public void setRangeCorrection(SelectedTag newMethod)
      Sets the confidence range correction mode used. Will be one of RANGE_NONE, or RANGE_BOUNDS
      Parameters:
      newMethod - the new correciton mode.
    • getRangeCorrection

      public SelectedTag getRangeCorrection()
      Gets the confidence range correction mode used. Will be one of RANGE_NONE, or RANGE_BOUNDS
      Returns:
      the confidence correction mode.
    • numXValFoldsTipText

      public String numXValFoldsTipText()
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getNumXValFolds

      public int getNumXValFolds()
      Get the number of folds used for cross-validation.
      Returns:
      the number of folds used for cross-validation.
    • setNumXValFolds

      public void setNumXValFolds(int newNumFolds)
      Set the number of folds used for cross-validation.
      Parameters:
      newNumFolds - the number of folds used for cross-validation.
    • graphType

      public int graphType()
      Returns the type of graph this classifier represents.
      Specified by:
      graphType in interface Drawable
      Returns:
      the type of graph this classifier represents
    • graph

      public String graph() throws Exception
      Returns graph describing the classifier (if possible).
      Specified by:
      graph in interface Drawable
      Returns:
      the graph of the classifier in dotty format
      Throws:
      Exception - if the classifier cannot be graphed
    • manualThresholdValueTipText

      public String manualThresholdValueTipText()
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setManualThresholdValue

      public void setManualThresholdValue(double threshold) throws Exception
      Sets the value for a manual threshold. If this option is set (non-negative value between 0 and 1), then options pertaining to automatic threshold selection are ignored.
      Parameters:
      threshold - the manual threshold to use
      Throws:
      Exception
    • getManualThresholdValue

      public double getManualThresholdValue()
      Returns the value of the manual threshold. (a negative value indicates that no manual threshold is being used.
      Returns:
      the value of the manual threshold.
    • toString

      public String toString()
      Returns description of the cross-validated classifier.
      Overrides:
      toString in class Object
      Returns:
      description of the cross-validated classifier as a string
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Classifier
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - the options