java.lang.Object
org.openoffice.da.comp.w2lcommon.tex.tokenizer.Mouth

public class Mouth extends Object

The Mouth is the main class of this package. It is a tokenizer to TeX files: According to "The TeXBook", the "eyes" and "mouth" of TeX are responsible for turning the input to TeX into a sequence of tokens. We are not going to reimplement TeX, but rather providing a service for parsing high-level languages based on TeX (eg. LaTeX, ConTeXt). For this reason the tokenizer deviates slightly from TeX: We're not reading a stream of bytes but rather a stream of characters (which makes no difference for ASCII files).

In tribute to Donald E. Knuths digestive metaphors, we divide the process in four levels

  • The parser should provide a pair of glasses to translate the stream of bytes into a stream of characters
  • The eyes sees the stream of characters as a sequence of lines
  • The mouth chews a bit on the characters to turn them into tokens
  • The tongue reports the "taste" of the token to the parser
  • Constructor Details

    • Mouth

      public Mouth(Reader reader) throws IOException
      Construct a new Mouth based on a character stream
      Parameters:
      reader - the character stream to tokenize
      Throws:
      IOException - if we fail to read the character stream
  • Method Details

    • getCatcodes

      public CatcodeTable getCatcodes()
      Get the currently used catcode table
      Returns:
      the table
    • setCatcodes

      public void setCatcodes(CatcodeTable catcodes)
      Set the catcode table. The catcode table can be changed at any time during tokenization.
      Parameters:
      catcodes - the table
    • getEndlinechar

      public char getEndlinechar()
      Return the current value of the \endlinechar (the character added to the end of each input line)
      Returns:
      the character
    • setEndlinechar

      public void setEndlinechar(char c)
      Set a new \endlinechar (the character added to the end of each input line). The character can be changed at any time during tokenization.
      Parameters:
      c - the character
    • getTokenObject

      public Token getTokenObject()
      Return the object used to store the current token (the "tongue" of TeX). The same object is reused for all tokens, so for convenience the parser can keep a reference to the object. If on the other hand the parser needs to store a token list, it must explicitly clone all tokens.
      Returns:
      the token
    • getToken

      public Token getToken() throws IOException
      Get the next token
      Returns:
      the token (for convenience; the same object is returned by getTokenObject()).
      Throws:
      IOException - if we fail to read the underlying stream