C# HttpUtility.HtmlEncode 不会对所有内容进行编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/547634/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 07:43:14  来源:igfitidea点击:

HttpUtility.HtmlEncode doesn't encode everything

c#htmlencodingutf-8

提问by Anthony

I am interacting with a web server using a desktop client program in C# and .Net 3.5. I am using Fiddler to see what traffic the web browser sends, and emulate that. Sadly this server is old, and is a bit confused about the notions of charsets and utf-8. Mostly it uses Latin-1.

我正在使用 C# 和 .Net 3.5 中的桌面客户端程序与 Web 服务器交互。我正在使用 Fiddler 来查看 Web 浏览器发送的流量,并进行模拟。遗憾的是,这个服务器很旧,并且对字符集和 utf-8 的概念有些困惑。大多数情况下它使用Latin-1。

When I enter data into the Web browser containing "special" chars, like "Ω π ? ∞ ? ? ? ? ? ? ? ? ? ? ? ? ? ?" fiddler show me that they are being transmitted as follows from browser to server: "♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓ "

当我在 Web 浏览器中输入包含“特殊”字符的数据时,例如“Ω π ? ∞ ? ? ? ? ? ? ? ? ? ? ? ? ?” fiddler 告诉我它们从浏览器到服务器的传输方式如下:"♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓ "

But for my client, HttpUtility.HtmlEncode does not convert these characters, it leaves them as is. What do I need to call to convert "?" to ♈ and so on?

但是对于我的客户,HttpUtility.HtmlEncode 不会转换这些字符,而是保持原样。我需要调用什么来转换“?” 到 ♈ 等等?

采纳答案by Rick

It seems horribly inefficient, but the only way I can think to do that is to look through each character:

这似乎非常低效,但我能想到的唯一方法是查看每个字符:

public static string MyHtmlEncode(string value)
{
   // call the normal HtmlEncode first
   char[] chars = HttpUtility.HtmlEncode(value).ToCharArray();
   StringBuilder encodedValue = new StringBuilder();
   foreach(char c in chars)
   {
      if ((int)c > 127) // above normal ASCII
         encodedValue.Append("&#" + (int)c + ";");
      else
         encodedValue.Append(c);
   }
   return encodedValue.ToString();
}

回答by bdukes

Rich Strahl just posted a blog post, Html and Uri String Encoding without System.Web, where he has some custom code that encodes the upper range of characters, too.

Rich Strahl 刚刚发布了一篇博文,Html 和 Uri String Encoding without System.Web,他在那里也有一些自定义代码可以对字符的上限进行编码。

/// <summary>
/// HTML-encodes a string and returns the encoded string.
/// </summary>
/// <param name="text">The text string to encode. </param>
/// <returns>The HTML-encoded text.</returns>
public static string HtmlEncode(string text)
{
    if (text == null)
        return null;

    StringBuilder sb = new StringBuilder(text.Length);

    int len = text.Length;
    for (int i = 0; i < len; i++)
    {
        switch (text[i])
        {

            case '<':
                sb.Append("&lt;");
                break;
            case '>':
                sb.Append("&gt;");
                break;
            case '"':
                sb.Append("&quot;");
                break;
            case '&':
                sb.Append("&amp;");
                break;
            default:
                if (text[i] > 159)
                {
                    // decimal numeric entity
                    sb.Append("&#");
                    sb.Append(((int)text[i]).ToString(CultureInfo.InvariantCulture));
                    sb.Append(";");
                }
                else
                    sb.Append(text[i]);
                break;
        }
    }
    return sb.ToString();
}

回答by AnthonyWJones

The return value type of HtmlEncode is a string, which is of Unicode and hence has not need to encode these characters.

HtmlEncode 的返回值类型是一个字符串,它是 Unicode 的,因此不需要对这些字符进行编码。

If the encoding of your output stream is not compatible with these characters then use HtmlEncode like this:-

如果您的输出流的编码与这些字符不兼容,则使用 HtmlEncode 如下:-

 HttpUtility.HtmlEncode(outgoingString, Response.Output);

HtmlEncode with then escape the characters appropriately.

HtmlEncode 然后适当地转义字符。

回答by Matt

It seems like HtmlEncode is just for encoding strings that are put into HTML documents, where only / < > & etc. cause problems. For URL's, just replace HtmlEncode with UrlEncode.

似乎 HtmlEncode 只是用于编码放入 HTML 文档的字符串,其中只有 / < > & 等会导致问题。对于 URL,只需将 HtmlEncode 替换为 UrlEncode。

回答by Joel Fillmore

The AntiXSS library from Microsoft correctly encodes these characters.

Microsoft 的 AntiXSS 库正确编码这些字符。

AntiXSS on Codeplex

Codeplex 上的 AntiXSS

Nuget package(best way to add as a reference)

Nuget 包(作为参考添加的最佳方式)

回答by Oliver Bock

@bdukes response above will do the job, but we can make it much faster if we assume that most characters will notbe in this range. Note the quoted 'ā' (unicode 0x0100)

上面的@bdukes 响应可以完成这项工作,但是如果我们假设大多数字符不在这个范围内,我们可以让它更快。注意引用的'ā'(unicode 0x0100)

/// <summary>.Net 2.0's HttpUtility.HtmlEncode will not properly encode
/// Unicode characters above 0xFF.  This may be fixed in newer 
/// versions.</summary>
public static string HtmlEncode(string s)
{
    // Let .Net 2.0 get right what it gets right.
    s = HttpUtility.HtmlEncode(s);

    // Search for first non-ASCII.  Hopefully none and we can just 
    // return s.
    int num = IndexOfHighChar(s, 0);
    if (num == -1)
        return s;
    int old_num = 0;
    StringBuilder sb = new StringBuilder();
    do {
        sb.Append(s, old_num, num - old_num);
        sb.Append("&#");
        sb.Append(((int)s[num]).ToString(NumberFormatInfo.InvariantInfo));
        sb.Append(';');
        old_num = num + 1;
        num = IndexOfHighChar(s, old_num);
    } while (num != -1);
    sb.Append(s, old_num, s.Length - old_num);
    return sb.ToString();
}

static unsafe int IndexOfHighChar(string s, int start)
{
    int num = s.Length - start;
    fixed (char* str = s) {
        char* chPtr = str + start;
        while (num > 0) {
            char ch = chPtr[0];
            if (ch >= 'ā')
                return s.Length - num;
            chPtr++;
            num--;
        }
    }
    return -1;
}

回答by Devdude

You can always replace the unwanted ASCII as follows: When this is encoded without the if statement the result string is "This means I am crying :'&'#39;) For whatever reason 'special characters' are handled and replaced with HTML char.

您始终可以按如下方式替换不需要的 ASCII:当它在没有 if 语句的情况下进行编码时,结果字符串是“这意味着我在哭:'&'#39;) 无论出于何种原因,'特殊字符' 都被处理并替换为 HTML 字符.

string text = "This means I am crying :'(";

string encoded = HttpUtility.HtmlEncode(text);
if(encoded.Contains("&#39;"))
{
    encoded = encoded.Replace("&#39;", "'");
}