Class Bagging

All Implemented Interfaces:
Serializable, Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

Class for bagging a classifier to reduce variance. Can do classification and regression depending on the base learner.

For more information, see

Leo Breiman (1996). Bagging predictors. Machine Learning. 24(2):123-140.

BibTeX:

 @article{Breiman1996,
    author = {Leo Breiman},
    journal = {Machine Learning},
    number = {2},
    pages = {123-140},
    title = {Bagging predictors},
    volume = {24},
    year = {1996}
 }
 

Valid options are:

 -P
  Size of each bag, as a percentage of the
  training set size. (default 100)
 
 -O
  Calculate the out of bag error.
 
 -S <num>
  Random number seed.
  (default 1)
 
 -I <num>
  Number of iterations.
  (default 10)
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 
 -W
  Full name of base classifier.
  (default: weka.classifiers.trees.REPTree)
 
 Options specific to classifier weka.classifiers.trees.REPTree:
 
 -M <minimum number of instances>
  Set minimum number of instances per leaf (default 2).
 
 -V <minimum variance for split>
  Set minimum numeric class variance proportion
  of train variance for split (default 1e-3).
 
 -N <number of folds>
  Number of folds for reduced error pruning (default 3).
 
 -S <seed>
  Seed for random data shuffling (default 1).
 
 -P
  No pruning.
 
 -L
  Maximum tree depth (default -1, no maximum)
 
Options after -- are passed to the designated classifier.

Version:
$Revision: 11572 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (len@reeltwo.com), Richard Kirkby (rkirkby@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • Bagging

      public Bagging()
      Constructor.
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing classifier
      Returns:
      a description suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class RandomizableIteratedSingleClassifierEnhancer
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -P
        Size of each bag, as a percentage of the
        training set size. (default 100)
       
       -O
        Calculate the out of bag error.
       
       -S <num>
        Random number seed.
        (default 1)
       
       -I <num>
        Number of iterations.
        (default 10)
       
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
       
       -W
        Full name of base classifier.
        (default: weka.classifiers.trees.REPTree)
       
       Options specific to classifier weka.classifiers.trees.REPTree:
       
       -M <minimum number of instances>
        Set minimum number of instances per leaf (default 2).
       
       -V <minimum variance for split>
        Set minimum numeric class variance proportion
        of train variance for split (default 1e-3).
       
       -N <number of folds>
        Number of folds for reduced error pruning (default 3).
       
       -S <seed>
        Seed for random data shuffling (default 1).
       
       -P
        No pruning.
       
       -L
        Maximum tree depth (default -1, no maximum)
       
      Options after -- are passed to the designated classifier.

      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class RandomizableIteratedSingleClassifierEnhancer
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the Classifier.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class RandomizableIteratedSingleClassifierEnhancer
      Returns:
      an array of strings suitable for passing to setOptions
    • bagSizePercentTipText

      public String bagSizePercentTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getBagSizePercent

      public int getBagSizePercent()
      Gets the size of each bag, as a percentage of the training set size.
      Returns:
      the bag size, as a percentage.
    • setBagSizePercent

      public void setBagSizePercent(int newBagSizePercent)
      Sets the size of each bag, as a percentage of the training set size.
      Parameters:
      newBagSizePercent - the bag size, as a percentage.
    • calcOutOfBagTipText

      public String calcOutOfBagTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setCalcOutOfBag

      public void setCalcOutOfBag(boolean calcOutOfBag)
      Set whether the out of bag error is calculated.
      Parameters:
      calcOutOfBag - whether to calculate the out of bag error
    • getCalcOutOfBag

      public boolean getCalcOutOfBag()
      Get whether the out of bag error is calculated.
      Returns:
      whether the out of bag error is calculated
    • measureOutOfBagError

      public double measureOutOfBagError()
      Gets the out of bag error that was calculated as the classifier was built.
      Returns:
      the out of bag error
    • enumerateMeasures

      public Enumeration enumerateMeasures()
      Returns an enumeration of the additional measure names.
      Specified by:
      enumerateMeasures in interface AdditionalMeasureProducer
      Returns:
      an enumeration of the measure names
    • getMeasure

      public double getMeasure(String additionalMeasureName)
      Returns the value of the named measure.
      Specified by:
      getMeasure in interface AdditionalMeasureProducer
      Parameters:
      additionalMeasureName - the name of the measure to query for its value
      Returns:
      the value of the named measure
      Throws:
      IllegalArgumentException - if the named measure is not supported
    • buildClassifier

      public void buildClassifier(Instances data) throws Exception
      Bagging method.
      Overrides:
      buildClassifier in class IteratedSingleClassifierEnhancer
      Parameters:
      data - the training data to be used for generating the bagged classifier.
      Throws:
      Exception - if the classifier could not be built successfully
    • distributionForInstance

      public double[] distributionForInstance(Instance instance) throws Exception
      Calculates the class membership probabilities for the given test instance.
      Overrides:
      distributionForInstance in class Classifier
      Parameters:
      instance - the instance to be classified
      Returns:
      preedicted class probability distribution
      Throws:
      Exception - if distribution can't be computed successfully
    • toString

      public String toString()
      Returns description of the bagged classifier.
      Overrides:
      toString in class Object
      Returns:
      description of the bagged classifier as a string
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Classifier
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - the options