Package org.lobobrowser.html.parser
Class HtmlParser
java.lang.Object
org.lobobrowser.html.parser.HtmlParser
The
HtmlParser
class is an HTML DOM parser.
This parser provides the functionality for
the standard DOM parser implementation DocumentBuilderImpl
.
This parser class may be used directly when a different DOM
implementation is preferred.-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
A nodeUserData
key used to tell nodes that their content may be about to be modified. -
Constructor Summary
ConstructorsConstructorDescriptionHtmlParser
(UserAgentContext ucontext, Document document) Constructs aHtmlParser
.HtmlParser
(UserAgentContext ucontext, Document document, ErrorHandler errorHandler, String publicId, String systemId) Constructs aHtmlParser
.HtmlParser
(Document document, ErrorHandler errorHandler, String publicId, String systemId) Deprecated.UserAgentContext should be passed in constructor. -
Method Summary
Modifier and TypeMethodDescriptionstatic boolean
isDecodeEntities
(String elementName) void
parse
(InputStream in) Parses HTML from an input stream, assuming the character set is ISO-8859-1.void
parse
(InputStream in, String charset) Parses HTML from an input stream, using the given character set.void
parse
(LineNumberReader reader) void
parse
(LineNumberReader reader, Node parent) This method may be used when the DOM should be built under a given node, such as wheninnerHTML
is used in Javascript.void
Parses HTML given by aReader
.void
This method may be used when the DOM should be built under a given node, such as wheninnerHTML
is used in Javascript.
-
Field Details
-
MODIFYING_KEY
A nodeUserData
key used to tell nodes that their content may be about to be modified. Elements could use this to temporarily suspend notifications. The value set will be eitherBoolean.TRUE
orBoolean.FALSE
.- See Also:
-
-
Constructor Details
-
HtmlParser
Deprecated.UserAgentContext should be passed in constructor.Constructs aHtmlParser
.- Parameters:
document
- A W3C Document instance.errorHandler
- The error handler.publicId
- The public ID of the document.systemId
- The system ID of the document.
-
HtmlParser
public HtmlParser(UserAgentContext ucontext, Document document, ErrorHandler errorHandler, String publicId, String systemId) Constructs aHtmlParser
.- Parameters:
ucontext
- The user agent context.document
- An W3C Document instance.errorHandler
- The error handler.publicId
- The public ID of the document.systemId
- The system ID of the document.
-
HtmlParser
Constructs aHtmlParser
.- Parameters:
ucontext
- The user agent context.document
- A W3C Document instance.
-
-
Method Details
-
isDecodeEntities
-
parse
Parses HTML from an input stream, assuming the character set is ISO-8859-1.- Parameters:
in
- The input stream.- Throws:
IOException
- Thrown when there are errors reading the stream.SAXException
- Thrown when there are parse errors.UnsupportedEncodingException
-
parse
public void parse(InputStream in, String charset) throws IOException, SAXException, UnsupportedEncodingException Parses HTML from an input stream, using the given character set.- Parameters:
in
- The input stream.charset
- The character set.- Throws:
IOException
- Thrown when there's an error reading from the stream.SAXException
- Thrown when there is a parser error.UnsupportedEncodingException
- Thrown if the character set is not supported.
-
parse
Parses HTML given by aReader
. This method appends nodes to the document provided to the parser.- Parameters:
reader
- An instance ofReader
.- Throws:
IOException
- Thrown if there are errors reading the input stream.SAXException
- Thrown if there are parse errors.
-
parse
- Throws:
IOException
SAXException
-
parse
This method may be used when the DOM should be built under a given node, such as wheninnerHTML
is used in Javascript.- Parameters:
reader
- A document reader.parent
- The root node for the parsed DOM.- Throws:
IOException
SAXException
-
parse
This method may be used when the DOM should be built under a given node, such as wheninnerHTML
is used in Javascript.- Parameters:
reader
- A LineNumberReader for the document.parent
- The root node for the parsed DOM.- Throws:
IOException
SAXException
-