如何清理 Java 中的 HTML 代码以防止 XSS 攻击？

Question

提问by WildWezyr

I'm looking for class/util etc. to sanitize HTML code i.e. remove dangerous tags, attributes and values to avoid XSS and similar attacks.

我正在寻找 class/util 等来清理 HTML 代码，即删除危险的标签、属性和值以避免 XSS 和类似的攻击。

I get html code from rich text editor (e.g. TinyMCE) but it can be send malicious way around, ommiting TinyMCE validation ("Data submitted form off-site").

我从富文本编辑器（例如 TinyMCE）获取 html 代码，但它可以通过恶意方式发送，忽略 TinyMCE 验证（“数据提交表单异地”）。

Is there anything as simple to use as InputFilter in PHP? Perfect solution I can imagine works like that (assume sanitizer is encapsulated in HtmlSanitizer class):

有没有像 PHP 中的 InputFilter 一样简单易用的东西？我可以想象的完美解决方案是这样的（假设 sanitizer 封装在 HtmlSanitizer 类中）：

String unsanitized = "...<...>...";           // some potentially 
                                              // dangerous html here on input

HtmlSanitizer sat = new HtmlSanitizer();      // sanitizer util class created

String sanitized = sat.sanitize(unsanitized); // voila - sanitized is safe...

Update- the simpler solution, the better! Small util class with as little external dependencies on other libraries/frameworks as possible - would be best for me.

更新- 解决方案越简单越好！对其他库/框架的外部依赖尽可能少的小型 util 类 - 对我来说是最好的。

How about that?

那个怎么样？

Answer 1

回答by Vineet Reynolds

~~You could use OWASP ESAPI for Java, which is a security library that is built to do such operations.~~

~~您可以使用OWASP ESAPI for Java，这是一个为执行此类操作而构建的安全库。~~

~~Not only does it have encoders for HTML, it also has encoders to perform JavaScript, CSS and URL encoding. Sample uses of ESAPIcan be found in the XSS prevention cheatsheet published by OWASP.~~

~~它不仅有 HTML 编码器，还有执行 JavaScript、CSS 和 URL 编码的编码器。ESAPI 的使用示例可以在 OWASP 发布的 XSS 预防备忘单中找到。~~

You could use the OWASP AntiSamyproject to define a site policy that states what is allowed in user-submitted content. The site policy can be later used to obtain "clean" HTML that is displayed back. You can find a sampleTinyMCE policy fileon the AntiSamy downloads page.

您可以使用OWASP AntiSamy项目来定义一个站点策略，该策略说明用户提交的内容中允许的内容。站点策略稍后可用于获取显示回来的“干净”HTML。您可以在AntiSamy 下载页面上找到示例TinyMCE 策略文件。

Answer 2

回答by RedYeti

Regarding Antisamy, you may want to check this regarding the dependencies:

关于 Antisamy，您可能需要检查有关依赖项的内容：

http://code.google.com/p/owaspantisamy/issues/detail?id=95&can=1&q=redyetidave

Answer 3

回答by eduardohl

HTML escaping inputs works very well. But in some cases business rules might require you NOT to escape the HTML. Using REGEX is not fit for the task and it is too hard to come up with a good solution using it.

HTML 转义输入效果很好。但在某些情况下，业务规则可能要求您不要转义 HTML。使用 REGEX 不适合这项任务，而且使用它想出一个好的解决方案太难了。

The best solution I found was to use: http://jsoup.org/cookbook/cleaning-html/whitelist-sanitizer

我发现的最佳解决方案是使用：http: //jsoup.org/cookbook/cleaning-html/whitelist-sanitizer

It builds a DOM tree with the provided input and filters any element not previosly allowed by a Whitelist. The API also has other functions for cleaning up html.

它使用提供的输入构建一个 DOM 树，并过滤之前白名单不允许的任何元素。该 API 还具有用于清理 html 的其他功能。

And it can also be used with javax.validation @SafeHtml(whitelistType=, additionalTags=)

它也可以与 javax.validation @SafeHtml(whitelistType=, additionalTags=) 一起使用

Answer 4

回答by SalHyman

You can try OWASP Java HTML Sanitizer. It is very simple to use.

你可以试试OWASP Java HTML Sanitizer。使用起来非常简单。

PolicyFactory policy = new HtmlPolicyBuilder()
    .allowElements("a")
    .allowUrlProtocols("https")
    .allowAttributes("href").onElements("a")
    .requireRelNofollowOnLinks()
    .build();

String safeHTML = policy.sanitize(untrustedHTML);

Answer 5

回答by P. Lee

Thanks to @SalHyman's answer. Just to elaborate more to OWASP Java HTML Sanitizer. It worked out really well (quick) for me. I just added the following to the pom.xml in my Maven project:

感谢@SalHyman 的回答。只是为了详细说明OWASP Java HTML Sanitizer。它对我来说效果很好（很快）。我刚刚在我的 Maven 项目的 pom.xml 中添加了以下内容：

    <dependency>
        <groupId>com.googlecode.owasp-java-html-sanitizer</groupId>
        <artifactId>owasp-java-html-sanitizer</artifactId>
        <version>20150501.1</version>
    </dependency>

Check herefor latest release.

检查这里的最新版本。

Then I added this function for sanitization:

然后我添加了这个功能进行消毒：

    private String sanitizeHTML(String untrustedHTML){
        PolicyFactory policy = new HtmlPolicyBuilder()
            .allowAttributes("src").onElements("img")
            .allowAttributes("href").onElements("a")
            .allowStandardUrlProtocols()
            .allowElements(
            "a", "img"
            ).toFactory();

        return policy.sanitize(untrustedHTML); 
    }

More tags can be added by extending the comma delimited parameter in allowElements method.

可以通过扩展 allowElements 方法中的逗号分隔参数来添加更多标签。

Just add this line prior passing the bean off to save the data:

只需在传递 bean 之前添加此行以保存数据：

    bean.setHtml(sanitizeHTML(bean.getHtml()));

That's it!

就是这样！

For more complex logic, this library is very flexible and it can handle more sophisticated sanitizing implementation.

对于更复杂的逻辑，这个库非常灵活，可以处理更复杂的清理实现。

如何清理 Java 中的 HTML 代码以防止 XSS 攻击？

提问by WildWezyr

回答by Vineet Reynolds

回答by RedYeti

回答by eduardohl

回答by SalHyman

回答by P. Lee

相关推荐

最近更新

标签

如何清理 Java 中的 HTML 代码以防止 XSS 攻击？

提问by WildWezyr

回答by Vineet Reynolds

回答by RedYeti

回答by eduardohl

回答by SalHyman

回答by P. Lee

相关推荐

Java 将数组分成更小的部分

不断收到异常 java.lang.LinkageError: Failed to link org/springframework/transaction/interceptor/TransactionInterceptor

Java 使用 JPA 指定索引（非唯一键）

Java Hibernate SessionFactory 与 EntityManagerFactory

相关推荐

最近更新

标签