在 Java 中转义 HTML 的推荐方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1265282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 07:35:59  来源:igfitidea点击:

Recommended method for escaping HTML in Java

javahtmlescaping

提问by Ben Lings

Is there a recommended way to escape <, >, "and &characters when outputting HTML in plain Java code? (Other than manually doing the following, that is).

有没有逃脱推荐的方式<>"&字符时输出HTML中普通的Java代码?(除了手动执行以下操作,即)。

String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = source.replace("<", "&lt;").replace("&", "&amp;"); // ...

回答by dfa

StringEscapeUtilsfrom Apache Commons Lang:

来自Apache Commons Lang 的StringEscapeUtils

import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
// ...
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = escapeHtml(source);

For version 3:

对于版本 3

import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;
// ...
String escaped = escapeHtml4(source);

回答by Adamski

An alternative to Apache Commons: Use Spring's HtmlUtils.htmlEscape(String input)method.

Apache Commons 的替代方法:使用SpringHtmlUtils.htmlEscape(String input)方法。

回答by AUU

For some purposes, HtmlUtils:

出于某些目的,HtmlUtils

import org.springframework.web.util.HtmlUtils;
[...]
HtmlUtils.htmlEscapeDecimal("&"); //gives &#38;
HtmlUtils.htmlEscape("&"); //gives &amp;

回答by Martin Dimitrov

There is a newer version of the Apache Commons Lang libraryand it uses a different package name (org.apache.commons.lang3). The StringEscapeUtilsnow has different static methods for escaping different types of documents (http://commons.apache.org/proper/commons-lang/javadocs/api-3.0/index.html). So to escape HTML version 4.0 string:

有一个较新版本的Apache Commons Lang 库,它使用不同的包名称 (org.apache.commons.lang3)。在StringEscapeUtils现在有逃避不同类型的文档不同的静态方法(http://commons.apache.org/proper/commons-lang/javadocs/api-3.0/index.html)。因此,要转义 HTML 4.0 版字符串:

import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;

String output = escapeHtml4("The less than sign (<) and ampersand (&) must be escaped before using them in HTML");

回答by Jeff Williams

Be careful with this. There are a number of different 'contexts' within an HTML document: Inside an element, quoted attribute value, unquoted attribute value, URL attribute, javascript, CSS, etc... You'll need to use a different encoding method for each of these to prevent Cross-Site Scripting (XSS). Check the OWASP XSS Prevention Cheat Sheetfor details on each of these contexts. You can find escaping methods for each of these contexts in the OWASP ESAPI library -- https://github.com/ESAPI/esapi-java-legacy.

小心这一点。HTML 文档中有许多不同的“上下文”:元素内部、带引号的属性值、不带引号的属性值、URL 属性、javascript、CSS 等……您需要为每个元素使用不同的编码方法这些是为了防止跨站脚本(XSS)。查看OWASP XSS 预防备忘单以获取有关每个上下文的详细信息。您可以在 OWASP ESAPI 库 - https://github.com/ESAPI/esapi-java-legacy 中找到每个上下文的转义方法。

回答by OriolJ

On android (API 16 or greater) you can:

在 android(API 16 或更高版本)上,您可以:

Html.escapeHtml(textToScape);

or for lower API:

或对于较低的 API:

TextUtils.htmlEncode(textToScape);

回答by Adam Gent

While @dfa answer of org.apache.commons.lang.StringEscapeUtils.escapeHtmlis nice and I have used it in the past it should not be used for escaping HTML (or XML) attributesotherwise the whitespace will be normalized (meaning all adjacent whitespace characters become a single space).

虽然@dfa 的答案org.apache.commons.lang.StringEscapeUtils.escapeHtml很好,而且我过去曾使用过它,但它不应该用于转义 HTML(或 XML)属性,否则空格将被规范化(意味着所有相邻的空格字符都变成一个空格)。

I know this because I have had bugs filed against my library (JATL) for attributes where whitespace was not preserved. Thus I have a drop in (copy n' paste) class (of which I stole some from JDOM) that differentiates the escaping of attributes and element content.

我知道这一点是因为我已经针对我的库 (JATL) 为未保留空格的属性提交了错误。因此,我有一个(复制和粘贴)类(其中我从 JDOM 中窃取了一些),它区分了属性和元素内容的转义

While this may not have mattered as much in the past (proper attribute escaping) it is increasingly become of greater interest given the use use of HTML5's data-attribute usage.

虽然这在过去可能没有那么重要(正确的属性转义),但鉴于使用 HTML5 的data-属性用法,它越来越引起人们的兴趣。

回答by Bruno Eberhard

Nice short method:

不错的简短方法:

public static String escapeHTML(String s) {
    StringBuilder out = new StringBuilder(Math.max(16, s.length()));
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c > 127 || c == '"' || c == '\'' || c == '<' || c == '>' || c == '&') {
            out.append("&#");
            out.append((int) c);
            out.append(';');
        } else {
            out.append(c);
        }
    }
    return out.toString();
}

Based on https://stackoverflow.com/a/8838023/1199155(the amp is missing there). The four characters checked in the if clause are the only ones below 128, according to http://www.w3.org/TR/html4/sgml/entities.html

基于https://stackoverflow.com/a/8838023/1199155(放大器在那里丢失)。根据http://www.w3.org/TR/html4/sgml/entities.html,if子句中检查的四个字符是唯一低于 128 的字符

回答by okrasz

For those who use Google Guava:

对于那些使用谷歌番石榴的人:

import com.google.common.html.HtmlEscapers;
[...]
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = HtmlEscapers.htmlEscaper().escape(source);

回答by Luca Stancapiano

org.apache.commons.lang3.StringEscapeUtils is now deprecated. You must now use org.apache.commons.text.StringEscapeUtils by

org.apache.commons.lang3.StringEscapeUtils 现在已弃用。您现在必须使用 org.apache.commons.text.StringEscapeUtils

    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-text</artifactId>
        <version>${commons.text.version}</version>
    </dependency>