Class HTMLParser

java.lang.Object
org.apache.jmeter.protocol.http.parser.HTMLParser
Direct Known Subclasses:
JsoupBasedHtmlParser

public abstract class HTMLParser extends Object
HtmlParsers can parse HTML content to obtain URLs.
  • Field Details

  • Constructor Details

    • HTMLParser

      protected HTMLParser()
      Protected constructor to prevent instantiation except from within subclasses.
  • Method Details

    • getParser

      public static final HTMLParser getParser()
    • getParser

      public static final HTMLParser getParser(String htmlParserClassName)
    • getEmbeddedResourceURLs

      public Iterator<URL> getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, String encoding) throws HTMLParseException
      Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...

      URLs should not appear twice in the returned iterator.

      Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.

      Parameters:
      userAgent - User Agent
      html - HTML code
      baseUrl - Base URL from which the HTML code was obtained
      encoding - Charset
      Returns:
      an Iterator for the resource URLs
      Throws:
      HTMLParseException - when parsing the html fails
    • getEmbeddedResourceURLs

      public abstract Iterator<URL> getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, URLCollection coll, String encoding) throws HTMLParseException
      Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...

      All URLs should be added to the Collection.

      Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.

      N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.

      Parameters:
      userAgent - User Agent
      html - HTML code
      baseUrl - Base URL from which the HTML code was obtained
      coll - URLCollection
      encoding - Charset
      Returns:
      an Iterator for the resource URLs
      Throws:
      HTMLParseException - when parsing the html fails
    • getEmbeddedResourceURLs

      public Iterator<URL> getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, Collection<URLString> coll, String encoding) throws HTMLParseException
      Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...

      N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.

      Parameters:
      userAgent - User Agent
      html - HTML code
      baseUrl - Base URL from which the HTML code was obtained
      coll - Collection - will contain URLString objects, not URLs
      encoding - Charset
      Returns:
      an Iterator for the resource URLs
      Throws:
      HTMLParseException - when parsing the html fails
    • isReusable

      protected boolean isReusable()
      Parsers should over-ride this method if the parser class is re-usable, in which case the class will be cached for the next getParser() call.
      Returns:
      true if the Parser is reusable
    • isEnableConditionalComments

      protected final boolean isEnableConditionalComments(Float ieVersion)
      Parameters:
      ieVersion - Float IE version
      Returns:
      true if IE version < IE v10
    • extractIEVersion

      protected Float extractIEVersion(String userAgent)
      Parameters:
      userAgent - User Agent
      Returns:
      version null if not IE or the version after MSIE