Java 是否有一个 JDK 类来进行 HTML 编码（但不是 URL 编码）？

Question

提问by Eddie

I am of course familiar with the java.net.URLEncoderand java.net.URLDecoderclasses. However, I only need HTML-style encoding. (I don't want ' 'replaced with '+', etc). I am not aware of any JDK built in class that will do just HTML encoding. Is there one? I am aware of other choices (for example, Jakarta Commons Lang 'StringEscapeUtils', but I don't want to add another external dependency to the project where I need this.

我当然熟悉java.net.URLEncoder和java.net.URLDecoder类。但是，我只需要 HTML 样式的编码。（我不想' '替换为'+'等）。我不知道任何 JDK 内置的类只执行 HTML 编码。有吗？我知道其他选择（例如，Jakarta Commons Lang 'StringEscapeUtils'，但我不想在需要它的项目中添加另一个外部依赖项。

I'm hoping that something has been added to a recent JDK (aka 5 or 6) that will do this that I don't know about. Otherwise I have to roll my own.

我希望在最近的 JDK（又名 5 或 6）中添加了一些我不知道的东西。否则我必须自己滚动。

Answer 1

采纳答案by Eddie

Apparently, the answer is, "No." This was unfortunately a case where I had to do something and couldn'tadd a new external dependency for it -- in the short term. I agree with everyone that using Commons Lang is the best long-term solution. This is what I will go with once I can add a new library to the project.

显然，答案是“不”。不幸的是，在这种情况下，我不得不做一些事情并且无法在短期内为其添加新的外部依赖项。我同意大家的看法，使用 Commons Lang 是最好的长期解决方案。一旦我可以向项目添加新库，这就是我将采用的方法。

It's a shame that something of such common use is not in the Java API.

遗憾的是，Java API 中没有这种常用的东西。

Answer 2

回答by simon

No. I would recommend using the StringEscapeUtils you mentioned, or for example JTidy (http://jtidy.sourceforge.net/multiproject/jtidyservlet/apidocs/org/w3c/tidy/servlet/util/HTMLEncode.html).

不。我建议使用您提到的 StringEscapeUtils，或者例如 JTidy（http://jtidy.sourceforge.net/multiproject/jtidyservlet/apidocs/org/w3c/tidy/servlet/util/HTMLEncode.html）。

Answer 3

回答by bitboxer

Please don't roll your own. Use Jakarta Commons Lang. It is tested and proven to work. Don't write code until you have to. "Not invented here" or "Not another dependency" is not a very good base for deciding what to choose / write.

请不要自己动手。使用 Jakarta Commons Lang。它经过测试并证明有效。除非必要，否则不要编写代码。“不是在这里发明的”或“不是另一个依赖项”不是决定选择/写什么的很好的基础。

Answer 4

回答by johnmcase

There isn't a JDK built in class to do this, but it is part of the Jakarta commons-lang library.

没有内置的 JDK 来执行此操作，但它是 Jakarta commons-lang 库的一部分。

String escaped = StringEscapeUtils.escapeHtml3(stringToEscape);
String escaped = StringEscapeUtils.escapeHtml4(stringToEscape);

Check out the JavaDoc

查看JavaDoc

Adding the dependency is usually as simple as dropping the jar somewhere, and commons-lang has so many useful utilities that it is often worthwhile having it on board.

添加依赖项通常就像将 jar 放在某个地方一样简单，并且 commons-lang 有很多有用的实用程序，因此通常值得将其放在板上。

Answer 5

回答by Rawton Evolekam

A simple way seem to be this one:

一种简单的方法似乎是这样的：

public static String encodeHTML(String s)
{
    StringBuffer out = new StringBuffer();
    for(int i=0; i<s.length(); i++)
    {
        char c = s.charAt(i);
        if(c > 127 || c=='"' || c=='<' || c=='>')
        {
           out.append("&#"+(int)c+";");
        }
        else
        {
            out.append(c);
        }
    }
    return out.toString();
}

Source: http://forums.thedailywtf.com/forums/p/2806/72054.aspx#72054

来源：http: //forums.thedailywtf.com/forums/p/2806/72054.aspx#72054

Answer 6

回答by peterh

I've found that all existing solutions (libraries) I've reviewed suffered from one or several of the below issues:

我发现我过的所有现有解决方案（库）都遇到了以下一个或几个问题：

They don't tell you in the Javadoc exactly what they replace.
They escape too much ... which makes the HTML much harder to read.
They do not document whenthe returned value is safe to use (safe to use for an HTML entity?, for an HTML attributute?, etc)
They are not optimized for speed.
They do not have a feature for avoiding double escaping (do not escape what is already escaped)
They replace single quote with '(wrong!)

他们没有在 Javadoc 中确切地告诉您他们替换了什么。
他们逃避太多......这使得HTML更难以阅读。
它们不会记录返回值何时可以安全使用（可安全用于 HTML 实体？，用于 HTML 属性？等）
它们没有针对速度进行优化。
它们没有避免双重转义的功能（不要转义已经转义的内容）
他们用'（错误！）

On top of this I also had the problem of not being able to bring in an external library, at least not without a certain amount of red tape.

最重要的是，我还遇到了无法引入外部图书馆的问题，至少在没有一定数量的繁文缛节的情况下是这样。

So, I rolled my own. Guilty.

所以，我推出了自己的。有罪。

Below is what it looks like but the latest version can always be found in this gist.

下面是它的样子，但最新版本总是可以在这个要点中找到。

/**
 * HTML string utilities
 */
public class SafeHtml {

    /**
     * Escapes a string for use in an HTML entity or HTML attribute.
     * 
     * <p>
     * The returned value is always suitable for an HTML <i>entity</i> but only
     * suitable for an HTML <i>attribute</i> if the attribute value is inside
     * double quotes. In other words the method is not safe for use with HTML
     * attributes unless you put the value in double quotes like this:
     * <pre>
     *    &lt;div title="value-from-this-method" &gt; ....
     * </pre>
     * Putting attribute values in double quotes is always a good idea anyway.
     * 
     * <p>The following characters will be escaped:
     * <ul>
     *   <li>{@code &} (ampersand) -- replaced with {@code &amp;}</li>
     *   <li>{@code <} (less than) -- replaced with {@code &lt;}</li>
     *   <li>{@code >} (greater than) -- replaced with {@code &gt;}</li>
     *   <li>{@code "} (double quote) -- replaced with {@code &quot;}</li>
     *   <li>{@code '} (single quote) -- replaced with {@code &#39;}</li>
     *   <li>{@code /} (forward slash) -- replaced with {@code &#47;}</li>
     * </ul>
     * It is not necessary to escape more than this as long as the HTML page
     * <a href="https://en.wikipedia.org/wiki/Character_encodings_in_HTML">uses
     * a Unicode encoding</a>. (Most web pages uses UTF-8 which is also the HTML5
     * recommendation.). Escaping more than this makes the HTML much less readable.
     * 
     * @param s the string to make HTML safe
     * @param avoidDoubleEscape avoid double escaping, which means for example not 
     *     escaping {@code &lt;} one more time. Any sequence {@code &....;}, as explained in
     *     {@link #isHtmlCharEntityRef(java.lang.String, int) isHtmlCharEntityRef()}, will not be escaped.
     * 
     * @return a HTML safe string 
     */
    public static String htmlEscape(String s, boolean avoidDoubleEscape) {
        if (s == null || s.length() == 0) {
            return s;
        }
        StringBuilder sb = new StringBuilder(s.length()+16);
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            switch (c) {
                case '&':
                    // Avoid double escaping if already escaped
                    if (avoidDoubleEscape && (isHtmlCharEntityRef(s, i))) {
                        sb.append('&');
                    } else {
                        sb.append("&amp;");
                    }
                    break;
                case '<':
                    sb.append("&lt;");
                    break;
                case '>':
                    sb.append("&gt;");
                    break;
                case '"':
                    sb.append("&quot;"); 
                    break;
                case '\'':
                    sb.append("&#39;"); 
                    break;
                case '/':
                    sb.append("&#47;"); 
                    break;
                default:
                    sb.append(c);
            }
        }
        return sb.toString();
  }

  /**
   * Checks if the value at {@code index} is a HTML entity reference. This
   * means any of :
   * <ul>
   *   <li>{@code &amp;} or {@code &lt;} or {@code &gt;} or {@code &quot;} </li>
   *   <li>A value of the form {@code &#dddd;} where {@code dddd} is a decimal value</li>
   *   <li>A value of the form {@code &#xhhhh;} where {@code hhhh} is a hexadecimal value</li>
   * </ul>
   * @param str the string to test for HTML entity reference.
   * @param index position of the {@code '&'} in {@code str}
   * @return 
   */
  public static boolean isHtmlCharEntityRef(String str, int index)  {
      if (str.charAt(index) != '&') {
          return false;
      }
      int indexOfSemicolon = str.indexOf(';', index + 1);
      if (indexOfSemicolon == -1) { // is there a semicolon sometime later ?
          return false;
      }
      if (!(indexOfSemicolon > (index + 2))) {   // is the string actually long enough
          return false;
      }
      if (followingCharsAre(str, index, "amp;")
              || followingCharsAre(str, index, "lt;")
              || followingCharsAre(str, index, "gt;")
              || followingCharsAre(str, index, "quot;")) {
          return true;
      }
      if (str.charAt(index+1) == '#') {
          if (str.charAt(index+2) == 'x' || str.charAt(index+2) == 'X') {
              // It's presumably a hex value
              if (str.charAt(index+3) == ';') {
                  return false;
              }
              for (int i = index+3; i < indexOfSemicolon; i++) {
                  char c = str.charAt(i);
                  if (c >= 48 && c <=57) {  // 0 -- 9
                      continue;
                  }
                  if (c >= 65 && c <=70) {   // A -- F
                      continue;
                  }
                  if (c >= 97 && c <=102) {   // a -- f
                      continue;
                  }
                  return false;  
              }
              return true;   // yes, the value is a hex string
          } else {
              // It's presumably a decimal value
              for (int i = index+2; i < indexOfSemicolon; i++) {
                  char c = str.charAt(i);
                  if (c >= 48 && c <=57) {  // 0 -- 9
                      continue;
                  }
                  return false;
              }
              return true; // yes, the value is decimal
          }
      }
      return false;
  } 


  /**
   * Tests if the chars following position <code>startIndex</code> in string
   * <code>str</code> are that of <code>nextChars</code>.
   * 
   * <p>Optimized for speed. Otherwise this method would be exactly equal to
   * {@code (str.indexOf(nextChars, startIndex+1) == (startIndex+1))}.
   *
   * @param str
   * @param startIndex
   * @param nextChars
   * @return 
   */  
  private static boolean followingCharsAre(String str, int startIndex, String nextChars)  {
      if ((startIndex + nextChars.length()) < str.length()) {
          for(int i = 0; i < nextChars.length(); i++) {
              if ( nextChars.charAt(i) != str.charAt(startIndex+i+1)) {
                  return false;
              }
          }
          return true;
      } else {
          return false;
      }
  }
}

TODO: Preserve consecutive whitespace.

TODO：保留连续的空格。

Answer 7

回答by Sachin Kokate

I will suggest use org.springframework.web.util.HtmlUtils.htmlEscape(String input)

我会建议使用 org.springframework.web.util.HtmlUtils.htmlEscape(String input)

may be this will help.

可能这会有所帮助。

Java 是否有一个 JDK 类来进行 HTML 编码（但不是 URL 编码）？

提问by Eddie

采纳答案by Eddie

回答by simon

回答by bitboxer

回答by johnmcase

回答by Rawton Evolekam

回答by peterh

回答by Sachin Kokate

相关推荐

最近更新

标签

Java 是否有一个 JDK 类来进行 HTML 编码（但不是 URL 编码）？

提问by Eddie

采纳答案by Eddie

回答by simon

回答by bitboxer

回答by johnmcase

回答by Rawton Evolekam

回答by peterh

回答by Sachin Kokate

相关推荐

用java创建一个魔方

Java 语法错误，插入“VariableDeclarators”完成LocalVariableDeclaration 发生

Java 更改参数化测试的名称

Java 如何将测试类包含到 Maven jar 中并执行它们？

相关推荐

最近更新

标签