在 Java 中转义 HTML 的推荐方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1265282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Recommended method for escaping HTML in Java
提问by Ben Lings
Is there a recommended way to escape <
, >
, "
and &
characters when outputting HTML in plain Java code? (Other than manually doing the following, that is).
有没有逃脱推荐的方式<
,>
,"
和&
字符时输出HTML中普通的Java代码?(除了手动执行以下操作,即)。
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = source.replace("<", "<").replace("&", "&"); // ...
回答by dfa
StringEscapeUtilsfrom Apache Commons Lang:
来自Apache Commons Lang 的StringEscapeUtils:
import static org.apache.commons.lang.StringEscapeUtils.escapeHtml;
// ...
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = escapeHtml(source);
For version 3:
对于版本 3:
import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;
// ...
String escaped = escapeHtml4(source);
回答by Adamski
回答by AUU
回答by Martin Dimitrov
There is a newer version of the Apache Commons Lang libraryand it uses a different package name (org.apache.commons.lang3). The StringEscapeUtils
now has different static methods for escaping different types of documents (http://commons.apache.org/proper/commons-lang/javadocs/api-3.0/index.html). So to escape HTML version 4.0 string:
有一个较新版本的Apache Commons Lang 库,它使用不同的包名称 (org.apache.commons.lang3)。在StringEscapeUtils
现在有逃避不同类型的文档不同的静态方法(http://commons.apache.org/proper/commons-lang/javadocs/api-3.0/index.html)。因此,要转义 HTML 4.0 版字符串:
import static org.apache.commons.lang3.StringEscapeUtils.escapeHtml4;
String output = escapeHtml4("The less than sign (<) and ampersand (&) must be escaped before using them in HTML");
回答by Jeff Williams
Be careful with this. There are a number of different 'contexts' within an HTML document: Inside an element, quoted attribute value, unquoted attribute value, URL attribute, javascript, CSS, etc... You'll need to use a different encoding method for each of these to prevent Cross-Site Scripting (XSS). Check the OWASP XSS Prevention Cheat Sheetfor details on each of these contexts. You can find escaping methods for each of these contexts in the OWASP ESAPI library -- https://github.com/ESAPI/esapi-java-legacy.
小心这一点。HTML 文档中有许多不同的“上下文”:元素内部、带引号的属性值、不带引号的属性值、URL 属性、javascript、CSS 等……您需要为每个元素使用不同的编码方法这些是为了防止跨站脚本(XSS)。查看OWASP XSS 预防备忘单以获取有关每个上下文的详细信息。您可以在 OWASP ESAPI 库 - https://github.com/ESAPI/esapi-java-legacy 中找到每个上下文的转义方法。
回答by OriolJ
On android (API 16 or greater) you can:
在 android(API 16 或更高版本)上,您可以:
Html.escapeHtml(textToScape);
or for lower API:
或对于较低的 API:
TextUtils.htmlEncode(textToScape);
回答by Adam Gent
While @dfa answer of org.apache.commons.lang.StringEscapeUtils.escapeHtml
is nice and I have used it in the past it should not be used for escaping HTML (or XML) attributesotherwise the whitespace will be normalized (meaning all adjacent whitespace characters become a single space).
虽然@dfa 的答案org.apache.commons.lang.StringEscapeUtils.escapeHtml
很好,而且我过去曾使用过它,但它不应该用于转义 HTML(或 XML)属性,否则空格将被规范化(意味着所有相邻的空格字符都变成一个空格)。
I know this because I have had bugs filed against my library (JATL) for attributes where whitespace was not preserved. Thus I have a drop in (copy n' paste) class (of which I stole some from JDOM) that differentiates the escaping of attributes and element content.
我知道这一点是因为我已经针对我的库 (JATL) 为未保留空格的属性提交了错误。因此,我有一个(复制和粘贴)类(其中我从 JDOM 中窃取了一些),它区分了属性和元素内容的转义。
While this may not have mattered as much in the past (proper attribute escaping) it is increasingly become of greater interest given the use use of HTML5's data-
attribute usage.
虽然这在过去可能没有那么重要(正确的属性转义),但鉴于使用 HTML5 的data-
属性用法,它越来越引起人们的兴趣。
回答by Bruno Eberhard
Nice short method:
不错的简短方法:
public static String escapeHTML(String s) {
StringBuilder out = new StringBuilder(Math.max(16, s.length()));
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c > 127 || c == '"' || c == '\'' || c == '<' || c == '>' || c == '&') {
out.append("&#");
out.append((int) c);
out.append(';');
} else {
out.append(c);
}
}
return out.toString();
}
Based on https://stackoverflow.com/a/8838023/1199155(the amp is missing there). The four characters checked in the if clause are the only ones below 128, according to http://www.w3.org/TR/html4/sgml/entities.html
基于https://stackoverflow.com/a/8838023/1199155(放大器在那里丢失)。根据http://www.w3.org/TR/html4/sgml/entities.html,if子句中检查的四个字符是唯一低于 128 的字符
回答by okrasz
For those who use Google Guava:
对于那些使用谷歌番石榴的人:
import com.google.common.html.HtmlEscapers;
[...]
String source = "The less than sign (<) and ampersand (&) must be escaped before using them in HTML";
String escaped = HtmlEscapers.htmlEscaper().escape(source);
回答by Luca Stancapiano
org.apache.commons.lang3.StringEscapeUtils is now deprecated. You must now use org.apache.commons.text.StringEscapeUtils by
org.apache.commons.lang3.StringEscapeUtils 现在已弃用。您现在必须使用 org.apache.commons.text.StringEscapeUtils
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>${commons.text.version}</version>
</dependency>