OWASP Java Html Sanitizer
Thank you for visiting OWASP.org. We recently migrated our community to a new web platform and regretably the content for this page needed to be programmatically ported from its previous wiki page. There’s still some work to be done. The historical content can be found here.
Please visit our Page Migration Guide for more information about updating pages for the new website as well as examples of github markdown.
This is an example of a Project or Chapter Page.
Main
What is this?The OWASP HTML Sanitizer Projects provides Java based HTML sanitization of untrusted HTML! The OWASP HTML Sanitizer Projects provides Java based HTML sanitization of untrusted HTML! Code RepoCode RepoOWASP HTML Sanitizer at GitHub OWASP HTML Sanitizer at GitHub Email ListEmail ListQuestions? Please sign up for our Project Support List Questions? Please sign up for our Project Support List Project LeadersProject LeadersAuthor/Project Leader Author/Project Leader Related ProjectsRelated Projects
OhlohOhloh |
Creating a HTML Policy
You can view a few basic prepackaged policies for links, tables, integers, images and more here: https://github.com/OWASP/java-html-sanitizer/blob/master/src/main/java/org/owasp/html/Sanitizers.java.
PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.LINKS);
String safeHTML = policy.sanitize(untrustedHTML);
There tests illustrate how to configure your own policy here: https://github.com/OWASP/java-html-sanitizer/blob/master/src/test/java/org/owasp/html/HtmlPolicyBuilderTest.java
PolicyFactory policy = new HtmlPolicyBuilder()
.allowElements("a")
.allowUrlProtocols("https")
.allowAttributes("href").onElements("a")
.requireRelNofollowOnLinks()
.build();
String safeHTML = policy.sanitize(untrustedHTML);
… or you can write custom policies …
PolicyFactory policy = new HtmlPolicyBuilder()
.allowElements("p")
.allowElements(
new ElementPolicy() {
public String apply(String elementName, List
Please note that the elements “a”, “font”, “img”, “input” and “span” need to be explicitly whitelisted using the `allowWithoutAttributes()` method if you want them to be allowed through the filter when these elements do not include any attributes.
You can also use the default “ebay” and “slashdot” policies. The Slashdot policy (defined here https://github.com/OWASP/java-html-sanitizer/blob/master/src/main/java/org/owasp/html/examples/SlashdotPolicyExample.java) allows the following tags (“a”, “p”, “div”, “i”, “b”, “em”, “blockquote”, “tt”, “strong”n “br”, “ul”, “ol”, “li”) and only certain attributes. This policy also allows for the custom slashdot tags, “quote” and “ecode”.
CSS Sanitization
CSS sanitization is challenging.
We disallow position:sticky and position:fixed so that client code can use a position:relative;overflow:hidden to contain self-styling sanitized snippets. Embedders of sanitized content do have to consistently do that and make sure that contributed content is clearly demarcated.
Most CSS attacks require a payload to specify selectors which the sanitizer should not allow. Unproxied images do allow tracking and, by positioning below the fold, can track whether a user scrolls down. Embedders do need to use URL rewriting if they allow background styling and use sensible Referrer-Policy and related headers.
That said, even if care is taken, CSS has a large attack surface, so not using it puts you in a safer place.
Inline/Embedded Images
Inline images use the data URI scheme to embed images directly within web pages. The following describes how to allow inline images in an HTML Sanitizer policy.
1) Add the “data” protocol do your whitelist. See: https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20160628.1/org/owasp/html/HtmlPolicyBuilder.html#allowUrlProtocols
.allowUrlProtocols("data")
2) You can then allow an attribute with an extra check thus
.allowAttributes("src")
.matching(...)
.onElements("img")
3) There are a number of things you can do in the matching part such as allow the following instead of just allowing data.
...
4) Since allowUrlProtocols(“data”) allows data URLs anywhere data URLs are allowed, you might want to also add a matcher to any other URL attributes that reject anything with a colon that does not start with http: or https: or mailto:
.allowAttributes("href")
.matching(...)
.onElements("a")
Questions
How was this project tested? This code was written with security best practices in mind, has an extensive test suite, and has undergone adversarial security review.
How is this project deployed? This project is best deployed through Maven https://github.com/OWASP/java-html-sanitizer/blob/master/docs/getting_started.md
Roadmap
- Maintaining a fully featured HTML sanitizer is a lot of work. We intend to continue to handle community questions and bug reports in a very timely manner.
- There are no plans for major new features other than supporting incoming requests for advanced sanitization such as additional HTML5 support.
NOTOC
Category:OWASP_Tool Category:OWASP_Alpha_Quality_Tool Java HTML Sanitizer
Example
Put whatever you like here: news, screenshots, features, supporters, or remove this file and don’t use tabs at all.
How to Use
The project is available at OWASP HTML Sanitizer : Maven Central
Creating a HTML Policy
1. Use prepackaged policies
You can view basic prepackaged policies for links, tables, integers, images at: https://github.com/OWASP/java-html-sanitizer/blob/master/src/main/java/org/owasp/html/Sanitizers.java.
`PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.LINKS);`
`String safeHTML = policy.sanitize(untrustedHTML);`
2. Configure own policy
Check the tests on how to configure your own policy at: https://github.com/OWASP/java-html-sanitizer/blob/master/src/test/java/org/owasp/html/HtmlPolicyBuilderTest.java
`PolicyFactory policy = new HtmlPolicyBuilder()`
` .allowElements("a")`
` .allowUrlProtocols("https")`
` .allowAttributes("href").onElements("a")`
` .requireRelNofollowOnLinks()`
` .build();`
`String safeHTML = policy.sanitize(untrustedHTML);`
3. Define custom policies
You can write custom policies :
`PolicyFactory policy = new HtmlPolicyBuilder()`
` .allowElements("p")`
` .allowElements(`
` new ElementPolicy() {`
` public String apply(String elementName, List`<String>` attrs) {`
` attrs.add("class");`
` attrs.add("header-" + elementName);`
` return "div";`
` }`
` }, "h1", "h2", "h3", "h4", "h5", "h6"))`
` .build();`
`String safeHTML = policy.sanitize(untrustedHTML);`
Please note that the elements “a”, “font”, “img”, “input” and “span” need to be explicitly whitelisted using the `allowWithoutAttributes()` method if you want them to be allowed through the filter when these elements do not include any attributes.
4. Use ebay / slashdot policies
You can also use the default “ebay” and “slashdot” policies.
The Slashdot policy allows the following tags (“a”, “p”, “div”, “i”, “b”, “em”, “blockquote”, “tt”, “strong”n “br”, “ul”, “ol”, “li”) and only certain attributes. This policy also allows for the custom slashdot tags,”quote” and “ecode”.
CSS Sanitization
CSS sanitization is challenging.
We disallow position:sticky and position:fixed so that client code can use a position:relative;overflow:hidden to contain self-styling sanitized snippets. Embedders of sanitized content do have to consistently do that and make sure that contributed content is clearly demarcated.
Most CSS attacks require a payload to specify selectors which the sanitizer should not allow. Unproxied images do allow tracking and, by positioning below the fold, can track whether a user scrolls down. Embedders do need to use URL rewriting if they allow background styling and use sensible Referrer-Policy and related headers.
That said, even if care is taken, CSS has a large attack surface, so not using it puts you in a safer place.
Inline/Embedded Images
Inline images use the data URI scheme to embed images directly within web pages. The following describes how to allow inline images in an HTML Sanitizer policy.
1) Add the “data” protocol do your whitelist. Se example how to add “data” protocol.
`.allowUrlProtocols("data")`
2) You can then allow an attribute with an extra check thus
`.allowAttributes("src")`
`.matching(...)`
`.onElements("img")`
3) There are a number of things you can do in the matching part such as allow the following instead of just allowing data.
4) Since allowUrlProtocols(“data”) allows data URLs anywhere data URLs are allowed, you might want to also add a matcher to any other URL attributes that reject anything with a colon that does not start with http: or https: or mailto:
`.allowAttributes("href")`
`.matching(...)`
`.onElements("a")`
News and Events
- [10 Sep 2020] Update OWASP wiki page
- [20 Feb 2018] Update 20180219.1 addresses iOS/MacOS “text bomb”
- [28 June 2016] v20160628.1 Released
- [14 Apr 2016] v20160413.1 Released
- [1 May 2015] Move to GitHub
- [2 July 2014] v239 Released
- [3 Mar 2014] v226 Released
- [5 Feb 2014] New Wiki
- [4 Sept 2013] v209 Released
Roadmap
- Maintaining a fully featured HTML sanitizer is a lot of work. We intend to continue to handle community questions and bug reports in a very timely manner.
- There are no plans for major new features other than supporting incoming requests for advanced sanitization such as additional HTML5 support.