在 Java 中为 XML 编码文本数据的最佳方法？

Question

提问by Epaga

Very similar to this question, except for Java.

与这个问题非常相似，除了Java。

What is the recommended way of encoding strings for an XML output in Java. The strings might contain characters like "&", "<", etc.

在 Java 中为 XML 输出编码字符串的推荐方法是什么。字符串可能包含“&”、“<”等字符。

Answer 1

采纳答案by Jon Skeet

Very simply: use an XML library. That way it will actually be rightinstead of requiring detailed knowledge of bits of the XML spec.

很简单：使用 XML 库。这样它实际上是正确的，而不需要详细了解 XML 规范的位。

Answer 2

回答by Fernando Miguélez

Use JAXPand forget about text handling it will be done for you automatically.

使用JAXP并忘记文本处理，它将自动为您完成。

Answer 3

回答by ng.

Just use.

就用吧。

<![CDATA[ your text here ]]>

This will allow any characters except the ending

这将允许除结尾之外的任何字符

]]>

So you can include characters that would be illegal such as & and >. For example.

因此，您可以包含非法字符，例如 & 和 >。例如。

<element><![CDATA[ characters such as & and > are allowed ]]></element>

However, attributes will need to be escaped as CDATA blocks can not be used for them.

但是，属性需要转义，因为 CDATA 块不能用于它们。

Answer 4

回答by Fabian Steeg

As others have mentioned, using an XML library is the easiest way. If you do want to escape yourself, you could look into StringEscapeUtilsfrom the Apache Commons Langlibrary.

正如其他人提到的，使用 XML 库是最简单的方法。如果您确实想逃避自己，可以StringEscapeUtils从Apache Commons Lang库中查看。

Answer 5

回答by Aaron Digulla

Note: Your question is about escaping, not encoding. Escaping is using <, etc. to allow the parser to distinguish between "this is an XML command" and "this is some text". Encoding is the stuff you specify in the XML header (UTF-8, ISO-8859-1, etc).

注意：您的问题是关于escaping，而不是encoding。转义是使用 < 等来允许解析器区分“这是一个 XML 命令”和“这是一些文本”。编码是您在 XML 标头中指定的内容（UTF-8、ISO-8859-1 等）。

First of all, like everyone else said, use an XML library. XML looks simple but the encoding+escaping stuff is dark voodoo (which you'll notice as soon as you encounter umlauts and Japanese and other weird stuff like "full width digits" (&#FF11; is 1)). Keeping XML human readable is a Sisyphus' task.

首先，就像其他人所说的那样，使用 XML 库。XML 看起来很简单，但编码+转义的东西是黑暗的巫术（一旦你遇到变音符号和日语以及其他奇怪的东西，比如“全角数字”（&#FF11; 是 1），你就会注意到这一点）。保持 XML 可读性是 Sisyphus 的任务。

I suggest never to try to be clever about text encoding and escaping in XML. But don't let that stop you from trying; just remember when it bites you (and it will).

我建议永远不要试图巧妙地使用 XML 进行文本编码和转义。但不要让它阻止你尝试；只要记住它什么时候咬你（它会）。

That said, if you use only UTF-8, to make things more readable you can consider this strategy:

也就是说，如果您只使用 UTF-8，为了使内容更具可读性，您可以考虑以下策略：

If the text does contain '<', '>' or '&', wrap it in <![CDATA[ ... ]]>
If the text doesn't contain these three characters, don't warp it.

如果文本确实包含 '<'、'>' 或 '&'，请将其换行 <![CDATA[ ... ]]>
如果文本不包含这三个字符，请不要扭曲它。

I'm using this in an SQL editor and it allows the developers to cut&paste SQL from a third party SQL tool into the XML without worrying about escaping. This works because the SQL can't contain umlauts in our case, so I'm safe.

我在 SQL 编辑器中使用它，它允许开发人员将 SQL 从第三方 SQL 工具剪切并粘贴到 XML 中，而不必担心转义。这是有效的，因为在我们的例子中 SQL 不能包含变音符号，所以我很安全。

Answer 6

回答by Thorbj?rn Ravn Andersen

This has worked well for me to provide an escaped version of a text string:

这对我来说很有效，可以提供文本字符串的转义版本：

public class XMLHelper {

/**
 * Returns the string where all non-ascii and <, &, > are encoded as numeric entities. I.e. "&lt;A &amp; B &gt;"
 * .... (insert result here). The result is safe to include anywhere in a text field in an XML-string. If there was
 * no characters to protect, the original string is returned.
 * 
 * @param originalUnprotectedString
 *            original string which may contain characters either reserved in XML or with different representation
 *            in different encodings (like 8859-1 and UFT-8)
 * @return
 */
public static String protectSpecialCharacters(String originalUnprotectedString) {
    if (originalUnprotectedString == null) {
        return null;
    }
    boolean anyCharactersProtected = false;

    StringBuffer stringBuffer = new StringBuffer();
    for (int i = 0; i < originalUnprotectedString.length(); i++) {
        char ch = originalUnprotectedString.charAt(i);

        boolean controlCharacter = ch < 32;
        boolean unicodeButNotAscii = ch > 126;
        boolean characterWithSpecialMeaningInXML = ch == '<' || ch == '&' || ch == '>';

        if (characterWithSpecialMeaningInXML || unicodeButNotAscii || controlCharacter) {
            stringBuffer.append("&#" + (int) ch + ";");
            anyCharactersProtected = true;
        } else {
            stringBuffer.append(ch);
        }
    }
    if (anyCharactersProtected == false) {
        return originalUnprotectedString;
    }

    return stringBuffer.toString();
}

}

Answer 7

回答by Amr Mostafa

While idealism says use an XML library, IMHO if you have a basic idea of XML then common sense and performance says template it all the way. It's arguably more readable too. Though using the escaping routines of a library is probably a good idea.

虽然理想主义说使用 XML 库，但恕我直言，如果您对 XML 有基本的了解，那么常识和性能就可以说是模板化。它也可以说更具可读性。尽管使用库的转义例程可能是一个好主意。

Consider this: XML wasmeant to be written by humans.

考虑一下：XML本来是由人类编写的。

Use libraries for generating XML when having your XML as an "object" better models your problem. For example, if pluggable modules participate in the process of building this XML.

当您将 XML 作为“对象”时，使用库生成 XML 可以更好地模拟您的问题。例如，如果可插拔模块参与构建此 XML 的过程。

Edit: as for how to actually escape XML in templates, use of CDATA or escapeXml(string)from JSTL are two good solutions, escapeXml(string)can be used like this:

编辑：至于如何在模板中实际转义 XML，使用 CDATA 或escapeXml(string)来自 JSTL 是两个很好的解决方案，escapeXml(string)可以这样使用：

<%@taglib prefix="fn" uri="http://java.sun.com/jsp/jstl/functions"%>

<item>${fn:escapeXml(value)}</item>

Answer 8

回答by Greg Burdett

To escape XML characters, the easiest way is to use the Apache Commons Lang project, JAR downloadable from: http://commons.apache.org/lang/

要转义 XML 字符，最简单的方法是使用 Apache Commons Lang 项目，JAR 可从以下网址下载：http: //commons.apache.org/lang/

The class is this: org.apache.commons.lang3.StringEscapeUtils;

这个类是这样的：org.apache.commons.lang3.StringEscapeUtils;

It has a method named "escapeXml", that will return an appropriately escaped String.

它有一个名为“escapeXml”的方法，它将返回一个适当转义的字符串。

Answer 9

回答by Jasper Krijgsman

The behavior of StringEscapeUtils.escapeXml() has changed from Commons Lang 2.5 to 3.0. It now no longer escapes Unicode characters greater than 0x7f.

StringEscapeUtils.escapeXml() 的行为已从 Commons Lang 2.5 更改为 3.0。它现在不再转义大于 0x7f 的 Unicode 字符。

This is a good thing, the old method was to be a bit to eager to escape entities that could just be inserted into a utf8 document.

这是一件好事，旧方法有点急于转义可以插入到 utf8 文档中的实体。

The new escapers to be included in Google Guava 11.0 also seem promising: http://code.google.com/p/guava-libraries/issues/detail?id=799

包含在 Google Guava 11.0 中的新转义符似乎也很有希望：http: //code.google.com/p/guava-libraries/issues/detail?id=799

Answer 10

回答by Pointer Null

Try this:

尝试这个：

String xmlEscapeText(String t) {
   StringBuilder sb = new StringBuilder();
   for(int i = 0; i < t.length(); i++){
      char c = t.charAt(i);
      switch(c){
      case '<': sb.append("&lt;"); break;
      case '>': sb.append("&gt;"); break;
      case '\"': sb.append("&quot;"); break;
      case '&': sb.append("&amp;"); break;
      case '\'': sb.append("&apos;"); break;
      default:
         if(c>0x7e) {
            sb.append("&#"+((int)c)+";");
         }else
            sb.append(c);
      }
   }
   return sb.toString();
}

在 Java 中为 XML 编码文本数据的最佳方法？

提问by Epaga

采纳答案by Jon Skeet

回答by Fernando Miguélez

回答by ng.

回答by Fabian Steeg

回答by Aaron Digulla

回答by Thorbj?rn Ravn Andersen

回答by Amr Mostafa

回答by Greg Burdett

回答by Jasper Krijgsman

回答by Pointer Null

相关推荐

最近更新

标签

在 Java 中为 XML 编码文本数据的最佳方法？

提问by Epaga

采纳答案by Jon Skeet

回答by Fernando Miguélez

回答by ng.

回答by Fabian Steeg

回答by Aaron Digulla

回答by Thorbj?rn Ravn Andersen

回答by Amr Mostafa

回答by Greg Burdett

回答by Jasper Krijgsman

回答by Pointer Null

相关推荐

使用 String.format 的 Java 十进制格式？

发出的每个列表项的 RxJava 延迟

Java：双值比较

Java Spring Boot 应用程序无法解析 org.springframework.boot 包

相关推荐

最近更新

标签