Class CSVLoader

All Implemented Interfaces:
Serializable, BatchConverter, FileSourcedConverter, Loader, EnvironmentHandler, OptionHandler, RevisionHandler

public class CSVLoader extends AbstractFileLoader implements BatchConverter, OptionHandler
Reads a source that is in comma separated or tab separated format. Assumes that the first row in the file determines the number of and names of the attributes.

Valid options are:

 -N <range>
  The range of attributes to force type to be NOMINAL.
  'first' and 'last' are accepted as well.
  Examples: "first-last", "1,4,5-27,50-last"
  (default: -none-)
 
 -S <range>
  The range of attribute to force type to be STRING.
  'first' and 'last' are accepted as well.
  Examples: "first-last", "1,4,5-27,50-last"
  (default: -none-)
 
 -D <range>
  The range of attribute to force type to be DATE.
  'first' and 'last' are accepted as well.
  Examples: "first-last", "1,4,5-27,50-last"
  (default: -none-)
 
 -format <date format>
  The date formatting string to use to parse date values.
  (default: "yyyy-MM-dd'T'HH:mm:ss")
 
 -M <str>
  The string representing a missing value.
  (default: ?)
 
 -E <enclosures>
  The enclosure character(s) to use for strings.
  Specify as a comma separated list (e.g. ",' (default: '"')
 
Version:
$Revision: 10372 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
  • Field Details

    • FILE_EXTENSION

      public static String FILE_EXTENSION
      the file extension.
  • Constructor Details

    • CSVLoader

      public CSVLoader()
      default constructor.
  • Method Details

    • getFileExtension

      public String getFileExtension()
      Get the file extension used for arff files.
      Specified by:
      getFileExtension in interface FileSourcedConverter
      Returns:
      the file extension
    • getFileDescription

      public String getFileDescription()
      Returns a description of the file type.
      Specified by:
      getFileDescription in interface FileSourcedConverter
      Returns:
      a short file description
    • getFileExtensions

      public String[] getFileExtensions()
      Gets all the file extensions used for this type of file.
      Specified by:
      getFileExtensions in interface FileSourcedConverter
      Returns:
      the file extensions
    • globalInfo

      public String globalInfo()
      Returns a string describing this attribute evaluator.
      Returns:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -N <range>
        The range of attributes to force type to be NOMINAL.
        'first' and 'last' are accepted as well.
        Examples: "first-last", "1,4,5-27,50-last"
        (default: -none-)
       
       -S <range>
        The range of attribute to force type to be STRING.
        'first' and 'last' are accepted as well.
        Examples: "first-last", "1,4,5-27,50-last"
        (default: -none-)
       
       -D <range>
        The range of attribute to force type to be DATE.
        'first' and 'last' are accepted as well.
        Examples: "first-last", "1,4,5-27,50-last"
        (default: -none-)
       
       -format <date format>
        The date formatting string to use to parse date values.
        (default: "yyyy-MM-dd'T'HH:mm:ss")
       
       -M <str>
        The string representing a missing value.
        (default: ?)
       
       -E <enclosures>
        The enclosure character(s) to use for strings.
        Specify as a comma separated list (e.g. ",' (default: '"')
       
      Specified by:
      setOptions in interface OptionHandler
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the Classifier.
      Specified by:
      getOptions in interface OptionHandler
      Returns:
      an array of strings suitable for passing to setOptions
    • setNominalAttributes

      public void setNominalAttributes(String value)
      Sets the attribute range to be forced to type nominal.
      Parameters:
      value - the range
    • getNominalAttributes

      public String getNominalAttributes()
      Returns the current attribute range to be forced to type nominal.
      Returns:
      the range
    • nominalAttributesTipText

      public String nominalAttributesTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setStringAttributes

      public void setStringAttributes(String value)
      Sets the attribute range to be forced to type string.
      Parameters:
      value - the range
    • getStringAttributes

      public String getStringAttributes()
      Returns the current attribute range to be forced to type string.
      Returns:
      the range
    • stringAttributesTipText

      public String stringAttributesTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDateAttributes

      public void setDateAttributes(String value)
      Set the attribute range to be forced to type date.
      Parameters:
      value - the range
    • getDateAttributes

      public String getDateAttributes()
      Returns the current attribute range to be forced to type date.
      Returns:
      the range.
    • dateAttributesTipText

      public String dateAttributesTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDateFormat

      public void setDateFormat(String value)
      Set the format to use for parsing date values.
      Parameters:
      value - the format to use.
    • getDateFormat

      public String getDateFormat()
      Get the format to use for parsing date values.
      Returns:
      the format to use for parsing date values.
    • dateFormatTipText

      public String dateFormatTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • enclosureCharactersTipText

      public String enclosureCharactersTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setEnclosureCharacters

      public void setEnclosureCharacters(String enclosure)
      Set the character(s) to use/recognize as string enclosures
      Parameters:
      enclosure - the characters to use as string enclosures
    • getEnclosureCharacters

      public String getEnclosureCharacters()
      Get the character(s) to use/recognize as string enclosures
      Returns:
      the characters to use as string enclosures
    • setMissingValue

      public void setMissingValue(String value)
      Sets the placeholder for missing values.
      Parameters:
      value - the placeholder
    • getMissingValue

      public String getMissingValue()
      Returns the current placeholder for missing values.
      Returns:
      the placeholder
    • missingValueTipText

      public String missingValueTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSource

      public void setSource(InputStream input) throws IOException
      Resets the Loader object and sets the source of the data set to be the supplied Stream object.
      Specified by:
      setSource in interface Loader
      Overrides:
      setSource in class AbstractLoader
      Parameters:
      input - the input stream
      Throws:
      IOException - if an error occurs
    • setSource

      public void setSource(File file) throws IOException
      Resets the Loader object and sets the source of the data set to be the supplied File object.
      Specified by:
      setSource in interface Loader
      Overrides:
      setSource in class AbstractFileLoader
      Parameters:
      file - the source file.
      Throws:
      IOException - if an error occurs
    • getStructure

      public Instances getStructure() throws IOException
      Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.
      Specified by:
      getStructure in interface Loader
      Specified by:
      getStructure in class AbstractLoader
      Returns:
      the structure of the data set as an empty set of Instances
      Throws:
      IOException - if an error occurs
    • getDataSet

      public Instances getDataSet() throws IOException
      Return the full data set. If the structure hasn't yet been determined by a call to getStructure then method should do so before processing the rest of the data set.
      Specified by:
      getDataSet in interface Loader
      Specified by:
      getDataSet in class AbstractLoader
      Returns:
      the structure of the data set as an empty set of Instances
      Throws:
      IOException - if there is no source or parsing fails
    • getNextInstance

      public Instance getNextInstance(Instances structure) throws IOException
      CSVLoader is unable to process a data set incrementally.
      Specified by:
      getNextInstance in interface Loader
      Specified by:
      getNextInstance in class AbstractLoader
      Parameters:
      structure - ignored
      Returns:
      never returns without throwing an exception
      Throws:
      IOException - always. CSVLoader is unable to process a data set incrementally.
    • reset

      public void reset() throws IOException
      Resets the Loader ready to read a new data set or the same data set again.
      Specified by:
      reset in interface Loader
      Overrides:
      reset in class AbstractFileLoader
      Throws:
      IOException - if something goes wrong
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method.
      Parameters:
      args - should contain the name of an input file.