Package org.htmlparser.tags
Class CompositeTag
java.lang.Object
org.htmlparser.nodes.AbstractNode
org.htmlparser.nodes.TagNode
org.htmlparser.tags.CompositeTag
- All Implemented Interfaces:
Serializable
,Cloneable
,Node
,Tag
- Direct Known Subclasses:
AppletTag
,BodyTag
,Bullet
,BulletList
,DefinitionList
,DefinitionListBullet
,Div
,FormTag
,FrameSetTag
,HeadingTag
,HeadTag
,Html
,LabelTag
,LinkTag
,ObjectTag
,OptionTag
,ParagraphTag
,ScriptTag
,SelectTag
,Span
,StyleTag
,TableColumn
,TableHeader
,TableRow
,TableTag
,TextareaTag
,TitleTag
The base class for tags that have an end tag.
Provided extra accessors for the children above and beyond what the basic
Tag
provides. Also handles the conversion of it's children for
the toHtml
method.- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final CompositeTagScanner
The default scanner for non-composite tags.protected Tag
The tag that causes this tag to finish.Fields inherited from class org.htmlparser.nodes.TagNode
breakTags, mAttributes, mDefaultScanner
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
accept
(NodeVisitor visitor) Tag visiting code.childAt
(int index) Get child at given indexchildren()
Get an iterator over the children of this node.void
collectInto
(NodeList list, NodeFilter filter) Collect this node and its child nodes (if-applicable) into the list parameter, provided the node satisfies the filtering criteria.Text[]
digupStringNode
(String searchText) Finds a text node, however embedded it might be, and returns it.elements()
Return the child tags as an iterator.int
findPositionOf
(String text) Returns the node number of the first node containing the given text.int
findPositionOf
(String text, Locale locale) Returns the node number of the first node containing the given text.int
findPositionOf
(Node searchNode) Returns the node number of a child node given the node object.getChild
(int index) Get the child of this node at the given position.int
Return the number of child nodes in this tag.Node[]
Get the children as an array ofNode
objects.Return the HTML code for the children of this tag.Get the end tag for this tag.Return the text between the start tag and the end tag.getText()
Return the text contained in this tag.protected void
putChildrenInto
(StringBuffer sb, boolean verbatim) Add the textual contents of the children of this node to the buffer.protected void
putEndTagInto
(StringBuffer sb, boolean verbatim) Add the textual contents of the end tag of this node to the buffer.void
removeChild
(int i) Remove the child at the position given.searchByName
(String name) Searches all children who for a name attribute.Collect all objects that are of a certain type Note that this will not check for parent types, and will not recurse through child tagsSearches for all nodes whose text representation contains the search string.Searches for all nodes whose text representation contains the search string.Searches for all nodes whose text representation contains the search string.void
Set the end tag for this tag.toHtml
(boolean verbatim) Return this tag as HTML code.Return the textual contents of this tag and it's children.toString()
Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging.void
toString
(int level, StringBuffer buffer) Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging.Methods inherited from class org.htmlparser.nodes.TagNode
breaksFlow, getAttribute, getAttributeEx, getAttributesEx, getEnders, getEndingLineNumber, getEndTagEnders, getIds, getRawTagName, getStartingLineNumber, getTagBegin, getTagEnd, getTagName, getThisScanner, isEmptyXmlTag, isEndTag, removeAttribute, setAttribute, setAttribute, setAttribute, setAttributeEx, setAttributesEx, setEmptyXmlTag, setTagBegin, setTagEnd, setTagName, setText, setThisScanner
Methods inherited from class org.htmlparser.nodes.AbstractNode
clone, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.htmlparser.Node
clone, doSemanticAction, getChildren, getEndPosition, getFirstChild, getLastChild, getNextSibling, getPage, getParent, getPreviousSibling, getStartPosition, setChildren, setEndPosition, setPage, setParent, setStartPosition, toHtml
-
Field Details
-
mEndTag
The tag that causes this tag to finish. May be a virtual tag generated by the scanning logic. -
mDefaultCompositeScanner
The default scanner for non-composite tags.
-
-
Constructor Details
-
CompositeTag
public CompositeTag()Create a composite tag.
-
-
Method Details
-
children
Get an iterator over the children of this node.- Returns:
- Am iterator over the children of this node.
-
getChild
Get the child of this node at the given position.- Parameters:
index
- The in the node list of the child.- Returns:
- The child at that index.
-
getChildrenAsNodeArray
Get the children as an array ofNode
objects.- Returns:
- The children in an array.
-
removeChild
public void removeChild(int i) Remove the child at the position given.- Parameters:
i
- The index of the child to remove.
-
elements
Return the child tags as an iterator. Equivalent to calling getChildren ().elements ().- Returns:
- An iterator over the children.
-
toPlainTextString
Return the textual contents of this tag and it's children.- Specified by:
toPlainTextString
in interfaceNode
- Overrides:
toPlainTextString
in classTagNode
- Returns:
- The 'browser' text contents of this tag.
-
putChildrenInto
Add the textual contents of the children of this node to the buffer.- Parameters:
verbatim
- Iftrue
return as close to the original page text as possible.sb
- The buffer to append to.
-
putEndTagInto
Add the textual contents of the end tag of this node to the buffer.- Parameters:
verbatim
- Iftrue
return as close to the original page text as possible.sb
- The buffer to append to.
-
toHtml
Return this tag as HTML code. -
searchByName
Searches all children who for a name attribute. Returns first match.- Parameters:
name
- Attribute to match in tag- Returns:
- Tag Tag matching the name attribute
-
searchFor
Searches for all nodes whose text representation contains the search string. Collects all nodes containing the search string into a NodeList. This search is case-insensitive and the search string and the node text are converted to uppercase using an English locale. For example, if you wish to find any textareas in a form tag containing "hello world", the code would be:NodeList nodeList = formTag.searchFor("Hello World");
- Parameters:
searchString
- Search criterion.- Returns:
- A collection of nodes whose string contents or
representation have the
searchString
in them.
-
searchFor
Searches for all nodes whose text representation contains the search string. Collects all nodes containing the search string into a NodeList. For example, if you wish to find any textareas in a form tag containing "hello world", the code would be:NodeList nodeList = formTag.searchFor("Hello World");
- Parameters:
searchString
- Search criterion.caseSensitive
- Iftrue
this search should be case sensitive. Otherwise, the search string and the node text are converted to uppercase using an English locale.- Returns:
- A collection of nodes whose string contents or
representation have the
searchString
in them.
-
searchFor
Searches for all nodes whose text representation contains the search string. Collects all nodes containing the search string into a NodeList. For example, if you wish to find any textareas in a form tag containing "hello world", the code would be:NodeList nodeList = formTag.searchFor("Hello World");
- Parameters:
searchString
- Search criterion.caseSensitive
- Iftrue
this search should be case sensitive. Otherwise, the search string and the node text are converted to uppercase using the locale provided.locale
- The locale for uppercase conversion.- Returns:
- A collection of nodes whose string contents or
representation have the
searchString
in them.
-
searchFor
Collect all objects that are of a certain type Note that this will not check for parent types, and will not recurse through child tags- Parameters:
classType
- The class to search for.recursive
- If true, recursively search through the children.- Returns:
- A list of children found.
-
findPositionOf
Returns the node number of the first node containing the given text. This can be useful to index into the composite tag and get other children. Text is compared without case sensitivity and conversion to uppercase uses an English locale.- Parameters:
text
- The text to search for.- Returns:
- int The node index in the children list of the node containing the text or -1 if not found.
- See Also:
-
findPositionOf
Returns the node number of the first node containing the given text. This can be useful to index into the composite tag and get other children. Text is compared without case sensitivity and conversion to uppercase uses the supplied locale.- Parameters:
locale
- The locale to use in converting to uppercase.text
- The text to search for.- Returns:
- int The node index in the children list of the node containing the text or -1 if not found.
-
findPositionOf
Returns the node number of a child node given the node object. This would typically be used in conjuction with digUpStringNode, after which the string node's parent can be used to find the string node's position. Faster than calling findPositionOf(text) again. Note that the position is at a linear level alone - there is no recursion in this method.- Parameters:
searchNode
- The child node to find.- Returns:
- The offset of the child tag or -1 if it was not found.
-
childAt
Get child at given index- Parameters:
index
- The index into the child node list.- Returns:
- Node The child node at the given index or null if none.
-
collectInto
Collect this node and its child nodes (if-applicable) into the list parameter, provided the node satisfies the filtering criteria.This mechanism allows powerful filtering code to be written very easily, without bothering about collection of embedded tags separately. e.g. when we try to get all the links on a page, it is not possible to get it at the top-level, as many tags (like form tags), can contain links embedded in them. We could get the links out by checking if the current node is a
CompositeTag
, and going through its children. So this method provides a convenient way to do this.Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:
NodeList list = new NodeList(); NodeFilter filter = new TagNameFilter ("A"); for (NodeIterator e = parser.elements(); e.hasMoreNodes();) e.nextNode().collectInto(list, filter);
Thus,list
will hold all the link nodes, irrespective of how deep the links are embedded.Another way to accomplish the same objective is:
NodeList list = new NodeList(); NodeFilter filter = new TagClassFilter (LinkTag.class); for (NodeIterator e = parser.elements(); e.hasMoreNodes();) e.nextNode().collectInto(list, filter);
This is slightly less specific because the LinkTag class may be registered for more than one node name, e.g. <LINK> tags too.- Specified by:
collectInto
in interfaceNode
- Overrides:
collectInto
in classAbstractNode
- Parameters:
list
- The list to add nodes to.filter
- The filter to apply.- See Also:
-
getChildrenHTML
Return the HTML code for the children of this tag.- Returns:
- A string with the HTML code for the contents of this tag.
-
accept
Tag visiting code. Invokesaccept()
on the start tag and then walks the child list invokingaccept()
on each of the children, finishing up with anaccept()
call on the end tag. IfshouldRecurseSelf()
returns true it then asks the visitor to visit itself. -
getChildCount
public int getChildCount()Return the number of child nodes in this tag.- Returns:
- The child node count.
-
getEndTag
Get the end tag for this tag. For example, if the node is {@.html -
setEndTag
Set the end tag for this tag. -
digupStringNode
Finds a text node, however embedded it might be, and returns it. The text node will retain links to its parents, so further navigation is possible.- Parameters:
searchText
- The text to search for.- Returns:
- The list of text nodes (recursively) found.
-
toString
Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging. -
getText
Return the text contained in this tag. -
getStringText
Return the text between the start tag and the end tag.- Returns:
- The contents of the CompositeTag.
-
toString
Return a string representation of the contents of this tag, it's children and it's end tag suitable for debugging.- Parameters:
level
- The indentation level to use.buffer
- The buffer to append to.
-