Package org.htmlcleaner
Class CleanerProperties
java.lang.Object
org.htmlcleaner.CleanerProperties
- All Implemented Interfaces:
HtmlModificationListener
Properties defining cleaner's behaviour
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
Adds a listener to the list of objects that will be notified about changes that cleaner does during cleanup process.void
addPruneTagNodeCondition
(ITagNodeCondition condition) Adds the condition to existing prune tag set.void
fireConditionModification
(ITagNodeCondition condition, TagNode tagNode) Fired when cleaner modifies html due toITagNodeCondition
match.void
fireHtmlError
(boolean certainty, TagNode startTagToken, ErrorType type) Fired when cleaner fixes some error in html syntax.void
fireUglyHtml
(boolean certainty, TagNode startTagToken, ErrorType errorType) Fired when cleaner fixes ugly html -- when syntax was correct but task was implemented by weird code.void
fireUserDefinedModification
(boolean certainty, TagNode tagNode, ErrorType errorType) Fired when cleaner modifies html due to user specified rules.int
Return the html versionGet the prefix to use to try to make valid attribute namesint
boolean
boolean
boolean
boolean
If false, when outputting XML, if an attribute name is not valid, attempt to fix it by using a prefix and removing invalid characters.boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
boolean
isUseCdataFor
(String useCdataFor) boolean
boolean
void
reset()
advancedXmlEscape = true; setUseCdataFor("script,style"); translateSpecialEntities = true; recognizeUnicodeChars = true; omitUnknownTags = false; treatUnknownTagsAsContent = false; omitDeprecatedTags = false; treatDeprecatedTagsAsContent = false; omitComments = false; omitXmlDeclaration = OptionalOutput.alwaysOutput; omitDoctypeDeclaration = OptionalOutput.alwaysOutput; omitHtmlEnvelope = OptionalOutput.alwaysOutput; useEmptyElementTags = true; allowMultiWordAttributes = true; allowHtmlInsideAttributes = false; ignoreQuestAndExclam = true; namespacesAware = true; keepHeadWhitespace = true; addNewlineToHeadAndBody = true; hyphenReplacementInComment = "="; pruneTags = null; allowTags = null; booleanAttributeValues = BOOL_ATT_SELF; collapseNullHtml = CollapseHtml.none charset = "UTF-8"; trimAttributeValues = true; tagInfoProvider = HTML5TagProvider.INSTANCE maxDepth = 1000void
setAddNewlineToHeadAndBody
(boolean addNewlineToHeadAndBody) void
setAdvancedXmlEscape
(boolean advancedXmlEscape) void
setAllowHtmlInsideAttributes
(boolean allowHtmlInsideAttributes) void
setAllowInvalidAttributeNames
(boolean allowInvalidAttributeNames) Set whether to allow invalid attribute names, or to try to fix or omit themvoid
setAllowMultiWordAttributes
(boolean allowMultiWordAttributes) void
setAllowTags
(String allowTags) void
setBooleanAttributeValues
(String booleanAttributeValues) void
setCharset
(String charset) void
setCleanerTransformations
(CleanerTransformations cleanerTransformations) void
setDeserializeEntities
(boolean deserializeEntities) void
setHtmlVersion
(int version) Sets the html version according to the parameter.Also,it sets the tag provider to the appropriate version.void
setHyphenReplacementInComment
(String hyphenReplacementInComment) void
setIgnoreQuestAndExclam
(boolean ignoreQuestAndExclam) void
setInvalidXmlAttributeNamePrefix
(String invalidXmlAttributePrefix) Sets the prefix to use for xml attributes that are invalidvoid
setKeepWhitespaceAndCommentsInHead
(boolean keepHeadWhitespace) void
setMaxDepth
(int maxDepth) void
setNamespacesAware
(boolean namespacesAware) void
setOmitCdataOutsideScriptAndStyle
(boolean value) void
setOmitComments
(boolean omitComments) void
setOmitDeprecatedTags
(boolean omitDeprecatedTags) void
setOmitDoctypeDeclaration
(boolean omitDoctypeDeclaration) void
setOmitHtmlEnvelope
(boolean omitHtmlEnvelope) void
setOmitUnknownTags
(boolean omitUnknownTags) void
setOmitXmlDeclaration
(boolean omitXmlDeclaration) void
setPruneTags
(String pruneTags) Resets prune tags set and adds tag name conditions to it.void
setRecognizeUnicodeChars
(boolean recognizeUnicodeChars) void
setTranslateSpecialEntities
(boolean translateSpecialEntities) TODO : useOptionalOutput
void
setTransResCharsToNCR
(boolean transResCharsToNCR) void
setTransSpecialEntitiesToNCR
(boolean transSpecialEntitiesToNCR) void
setTreatDeprecatedTagsAsContent
(boolean treatDeprecatedTagsAsContent) void
setTreatUnknownTagsAsContent
(boolean treatUnknownTagsAsContent) void
setTrimAttributeValues
(boolean trimAttributeValues) void
setUseCdataFor
(String useCdataFor) void
setUseCdataForScriptAndStyle
(boolean useCdataForScriptAndStyle) void
setUseEmptyElementTags
(boolean useEmptyElementTags)
-
Field Details
-
DEFAULT_CHARSET
- See Also:
-
BOOL_ATT_SELF
- See Also:
-
BOOL_ATT_EMPTY
- See Also:
-
BOOL_ATT_TRUE
- See Also:
-
-
Constructor Details
-
CleanerProperties
public CleanerProperties() -
CleanerProperties
- Parameters:
tagInfoProvider
-
-
-
Method Details
-
getMaxDepth
public int getMaxDepth() -
setMaxDepth
public void setMaxDepth(int maxDepth) -
getTagInfoProvider
-
isAdvancedXmlEscape
public boolean isAdvancedXmlEscape() -
setAdvancedXmlEscape
public void setAdvancedXmlEscape(boolean advancedXmlEscape) -
isTransResCharsToNCR
public boolean isTransResCharsToNCR() -
setTransResCharsToNCR
public void setTransResCharsToNCR(boolean transResCharsToNCR) -
isUseCdataForScriptAndStyle
public boolean isUseCdataForScriptAndStyle() -
setUseCdataForScriptAndStyle
public void setUseCdataForScriptAndStyle(boolean useCdataForScriptAndStyle) -
setUseCdataFor
-
getUseCdataFor
-
isUseCdataFor
-
isTranslateSpecialEntities
public boolean isTranslateSpecialEntities() -
setTranslateSpecialEntities
public void setTranslateSpecialEntities(boolean translateSpecialEntities) TODO : useOptionalOutput
- Parameters:
translateSpecialEntities
-
-
isRecognizeUnicodeChars
public boolean isRecognizeUnicodeChars() -
setRecognizeUnicodeChars
public void setRecognizeUnicodeChars(boolean recognizeUnicodeChars) -
isOmitUnknownTags
public boolean isOmitUnknownTags() -
setOmitUnknownTags
public void setOmitUnknownTags(boolean omitUnknownTags) -
isTreatUnknownTagsAsContent
public boolean isTreatUnknownTagsAsContent() -
setTreatUnknownTagsAsContent
public void setTreatUnknownTagsAsContent(boolean treatUnknownTagsAsContent) -
isOmitDeprecatedTags
public boolean isOmitDeprecatedTags() -
setOmitDeprecatedTags
public void setOmitDeprecatedTags(boolean omitDeprecatedTags) -
isTreatDeprecatedTagsAsContent
public boolean isTreatDeprecatedTagsAsContent() -
setTreatDeprecatedTagsAsContent
public void setTreatDeprecatedTagsAsContent(boolean treatDeprecatedTagsAsContent) -
isOmitComments
public boolean isOmitComments() -
setOmitComments
public void setOmitComments(boolean omitComments) -
isOmitXmlDeclaration
public boolean isOmitXmlDeclaration() -
setOmitXmlDeclaration
public void setOmitXmlDeclaration(boolean omitXmlDeclaration) -
isOmitDoctypeDeclaration
public boolean isOmitDoctypeDeclaration()- Returns:
- also return true if omitting the Html Envelope
-
setOmitDoctypeDeclaration
public void setOmitDoctypeDeclaration(boolean omitDoctypeDeclaration) -
isOmitHtmlEnvelope
public boolean isOmitHtmlEnvelope() -
setOmitHtmlEnvelope
public void setOmitHtmlEnvelope(boolean omitHtmlEnvelope) -
isUseEmptyElementTags
public boolean isUseEmptyElementTags() -
setUseEmptyElementTags
public void setUseEmptyElementTags(boolean useEmptyElementTags) -
isAllowMultiWordAttributes
public boolean isAllowMultiWordAttributes() -
setAllowMultiWordAttributes
public void setAllowMultiWordAttributes(boolean allowMultiWordAttributes) -
isAllowHtmlInsideAttributes
public boolean isAllowHtmlInsideAttributes() -
setAllowHtmlInsideAttributes
public void setAllowHtmlInsideAttributes(boolean allowHtmlInsideAttributes) -
isIgnoreQuestAndExclam
public boolean isIgnoreQuestAndExclam() -
setIgnoreQuestAndExclam
public void setIgnoreQuestAndExclam(boolean ignoreQuestAndExclam) -
isNamespacesAware
public boolean isNamespacesAware() -
setNamespacesAware
public void setNamespacesAware(boolean namespacesAware) -
isAddNewlineToHeadAndBody
public boolean isAddNewlineToHeadAndBody() -
setAddNewlineToHeadAndBody
public void setAddNewlineToHeadAndBody(boolean addNewlineToHeadAndBody) -
isKeepWhitespaceAndCommentsInHead
public boolean isKeepWhitespaceAndCommentsInHead() -
setKeepWhitespaceAndCommentsInHead
public void setKeepWhitespaceAndCommentsInHead(boolean keepHeadWhitespace) -
getHyphenReplacementInComment
-
setHyphenReplacementInComment
-
getPruneTags
-
isOmitCdataOutsideScriptAndStyle
public boolean isOmitCdataOutsideScriptAndStyle() -
setOmitCdataOutsideScriptAndStyle
public void setOmitCdataOutsideScriptAndStyle(boolean value) -
isDeserializeEntities
public boolean isDeserializeEntities() -
setDeserializeEntities
public void setDeserializeEntities(boolean deserializeEntities) -
setHtmlVersion
public void setHtmlVersion(int version) Sets the html version according to the parameter.Also,it sets the tag provider to the appropriate version.- Parameters:
version
- Number 4 for html4 or 5 for html5
-
getHtmlVersion
public int getHtmlVersion()Return the html version- Returns:
- int The html version
-
isTrimAttributeValues
public boolean isTrimAttributeValues() -
setTrimAttributeValues
public void setTrimAttributeValues(boolean trimAttributeValues) -
setPruneTags
Resets prune tags set and adds tag name conditions to it. All the tags listed by pruneTags param are added.- Parameters:
pruneTags
-
-
addPruneTagNodeCondition
Adds the condition to existing prune tag set.- Parameters:
condition
-
-
getPruneTagSet
-
getAllowTags
-
setAllowTags
-
isTransSpecialEntitiesToNCR
public boolean isTransSpecialEntitiesToNCR() -
setTransSpecialEntitiesToNCR
public void setTransSpecialEntitiesToNCR(boolean transSpecialEntitiesToNCR) -
getAllowTagSet
-
setCharset
- Parameters:
charset
- the charset to set
-
getCharset
- Returns:
- the charset
-
getBooleanAttributeValues
-
setBooleanAttributeValues
-
reset
public void reset()advancedXmlEscape = true; setUseCdataFor("script,style"); translateSpecialEntities = true; recognizeUnicodeChars = true; omitUnknownTags = false; treatUnknownTagsAsContent = false; omitDeprecatedTags = false; treatDeprecatedTagsAsContent = false; omitComments = false; omitXmlDeclaration = OptionalOutput.alwaysOutput; omitDoctypeDeclaration = OptionalOutput.alwaysOutput; omitHtmlEnvelope = OptionalOutput.alwaysOutput; useEmptyElementTags = true; allowMultiWordAttributes = true; allowHtmlInsideAttributes = false; ignoreQuestAndExclam = true; namespacesAware = true; keepHeadWhitespace = true; addNewlineToHeadAndBody = true; hyphenReplacementInComment = "="; pruneTags = null; allowTags = null; booleanAttributeValues = BOOL_ATT_SELF; collapseNullHtml = CollapseHtml.none charset = "UTF-8"; trimAttributeValues = true; tagInfoProvider = HTML5TagProvider.INSTANCE maxDepth = 1000 -
getCleanerTransformations
- Returns:
- the cleanerTransformations
-
setCleanerTransformations
-
addHtmlModificationListener
Adds a listener to the list of objects that will be notified about changes that cleaner does during cleanup process.- Parameters:
listener
- -- listener object to be notified of the changes.
-
fireConditionModification
Description copied from interface:HtmlModificationListener
Fired when cleaner modifies html due toITagNodeCondition
match.- Specified by:
fireConditionModification
in interfaceHtmlModificationListener
- Parameters:
condition
- that was applied to make the modificationtagNode
- - problematic node.
-
fireHtmlError
Description copied from interface:HtmlModificationListener
Fired when cleaner fixes some error in html syntax.- Specified by:
fireHtmlError
in interfaceHtmlModificationListener
- Parameters:
certainty
- - true if change made doesn't hurts end document.startTagToken
- - problematic node.
-
fireUglyHtml
Description copied from interface:HtmlModificationListener
Fired when cleaner fixes ugly html -- when syntax was correct but task was implemented by weird code. For example when deprecated tags are removed.- Specified by:
fireUglyHtml
in interfaceHtmlModificationListener
- Parameters:
certainty
- - true if change made doesn't hurts end document.startTagToken
- - problematic node.
-
fireUserDefinedModification
Description copied from interface:HtmlModificationListener
Fired when cleaner modifies html due to user specified rules.- Specified by:
fireUserDefinedModification
in interfaceHtmlModificationListener
- Parameters:
certainty
- - true if change made doesn't hurts end document.tagNode
- - problematic node.
-
getInvalidXmlAttributeNamePrefix
Get the prefix to use to try to make valid attribute names- Returns:
- invalidAttributeNamePrefix
-
setInvalidXmlAttributeNamePrefix
Sets the prefix to use for xml attributes that are invalid- Parameters:
invalidXmlAttributePrefix
- the prefix to use
-
setAllowInvalidAttributeNames
public void setAllowInvalidAttributeNames(boolean allowInvalidAttributeNames) Set whether to allow invalid attribute names, or to try to fix or omit them- Parameters:
allowInvalidAttributeNames
- True if invalid attributes allowed
-
isAllowInvalidAttributeNames
public boolean isAllowInvalidAttributeNames()If false, when outputting XML, if an attribute name is not valid, attempt to fix it by using a prefix and removing invalid characters. Otherwise, omit invalid attributes- Returns:
- True if invalid attribute names are allowed.
-