在 C# 中将 HTML 实体转换为 Unicode 字符

Question

提问by Remy

I found similar questions and answers for Python and Javascript, but not for C# or any other WinRT compatible language.

我在 Python 和 Javascript 中找到了类似的问题和答案，但没有为 C# 或任何其他 WinRT 兼容语言找到类似的问题和答案。

The reason I think I need it, is because I'm displaying text I get from websites in a Windows 8 store app. E.g. éshould become é.

我认为我需要它的原因是因为我正在显示从 Windows 8 商店应用程序中的网站获得的文本。例如é应该成为é.

Or is there a better way? I'm not displaying websites or rss feeds, but just a list of websites and their titles.

或者，还有更好的方法？我不显示网站或 RSS 提要，而只是显示网站及其标题的列表。

Answer 1

采纳答案by Blachshma

I recommend using System.Net.WebUtility.HtmlDecodeand NOTHttpUtility.HtmlDecode.

我建议使用System.Net.WebUtility.HtmlDecode而不是HttpUtility.HtmlDecode。

This is due to the fact that the System.Webreference does not exist in Winforms/WPF/Console applications and you can get the exact same result using this class (which is already added as a reference in all those projects).

这是因为该System.Web引用在 Winforms/WPF/Console 应用程序中不存在，您可以使用此类获得完全相同的结果（已在所有这些项目中作为引用添加）。

Usage:

用法：

string s =  System.Net.WebUtility.HtmlDecode("&eacute;"); // Returns é

Answer 2

回答by Mudassir Hasan

Use HttpUtility.HtmlDecode().Read on msdn here

HttpUtility.HtmlDecode()在这里使用.Read on msdn

decodedString = HttpUtility.HtmlDecode(myEncodedString)

Answer 3

回答by user1954682

Different coding/encoding of HTML entities and HTML numbers in Metro App and WP8 App.

Metro App 和 WP8 App 中 HTML 实体和 HTML 编号的不同编码/编码。

With Windows Runtime Metro App

使用 Windows 运行时 Metro 应用程序

{
    string inStr = "ó";
    string auxStr = System.Net.WebUtility.HtmlEncode(inStr);
    // auxStr == &#243;
    string outStr = System.Net.WebUtility.HtmlDecode(auxStr);
    // outStr == ó
    string outStr2 = System.Net.WebUtility.HtmlDecode("&oacute;");
    // outStr2 == ó
}

With Windows Phone 8.0

使用 Windows Phone 8.0

{
    string inStr = "ó";
    string auxStr = System.Net.WebUtility.HtmlEncode(inStr);
    // auxStr == &#243;
    string outStr = System.Net.WebUtility.HtmlDecode(auxStr);
    // outStr == &#243;
    string outStr2 = System.Net.WebUtility.HtmlDecode("&oacute;");
    // outStr2 == ó
}

To solve this, in WP8, I have implemented the table in HTML ISO-8859-1 Referencebefore calling System.Net.WebUtility.HtmlDecode().

为了解决这个问题，在WP8，我已经实现了在表HTML ISO-8859-1参考之前调用System.Net.WebUtility.HtmlDecode()。

Answer 4

回答by zumey

This might be useful, replaces all (for as far as my requirements go) entities with their unicode equivalent.

这可能很有用，用它们的 unicode 等效替换所有（就我的要求而言）实体。

    public string EntityToUnicode(string html) {
        var replacements = new Dictionary<string, string>();
        var regex = new Regex("(&[a-z]{2,5};)");
        foreach (Match match in regex.Matches(html)) {
            if (!replacements.ContainsKey(match.Value)) { 
                var unicode = HttpUtility.HtmlDecode(match.Value);
                if (unicode.Length == 1) {
                    replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
                }
            }
        }
        foreach (var replacement in replacements) {
            html = html.Replace(replacement.Key, replacement.Value);
        }
        return html;
    }

Answer 5

回答by hcoverlambda

This worked for me, replaces both common and unicode entities.

这对我有用，替换了 common 和 unicode 实体。

private static readonly Regex HtmlEntityRegex = new Regex("&(#)?([a-zA-Z0-9]*);");

public static string HtmlDecode(this string html)
{
    if (html.IsNullOrEmpty()) return html;
    return HtmlEntityRegex.Replace(html, x => x.Groups[1].Value == "#"
        ? ((char)int.Parse(x.Groups[2].Value)).ToString()
        : HttpUtility.HtmlDecode(x.Groups[0].Value));
}

[Test]
[TestCase(null, null)]
[TestCase("", "")]
[TestCase("&#39;fark&#39;", "'fark'")]
[TestCase("&quot;fark&quot;", "\"fark\"")]
public void should_remove_html_entities(string html, string expected)
{
    html.HtmlDecode().ShouldEqual(expected);
}

Answer 6

回答by EminST

Improved Zumey method (I can`t comment there). Max char size is in the entity: &exclamation; (11). Upper case in the entities are also possible, ex. À (Source from wiki)

改进的 Zumey 方法（我无法在那里发表评论）。最大字符大小在实体中：&exclamation; (11)。实体中的大写也是可能的，例如。À（来自维基）

public string EntityToUnicode(string html) {
        var replacements = new Dictionary<string, string>();
        var regex = new Regex("(&[a-zA-Z]{2,11};)");
        foreach (Match match in regex.Matches(html)) {
            if (!replacements.ContainsKey(match.Value)) { 
                var unicode = HttpUtility.HtmlDecode(match.Value);
                if (unicode.Length == 1) {
                    replacements.Add(match.Value, string.Concat("&#", Convert.ToInt32(unicode[0]), ";"));
                }
            }
        }
        foreach (var replacement in replacements) {
            html = html.Replace(replacement.Key, replacement.Value);
        }
        return html;
    }

在 C# 中将 HTML 实体转换为 Unicode 字符

提问by Remy

采纳答案by Blachshma

回答by Mudassir Hasan

回答by user1954682

With Windows Runtime Metro App

使用 Windows 运行时 Metro 应用程序

With Windows Phone 8.0

使用 Windows Phone 8.0

回答by zumey

回答by hcoverlambda

回答by EminST

相关推荐

最近更新

标签

在 C# 中将 HTML 实体转换为 Unicode 字符

提问by Remy

采纳答案by Blachshma

回答by Mudassir Hasan

回答by user1954682

With Windows Runtime Metro App

使用 Windows 运行时 Metro 应用程序

With Windows Phone 8.0

使用 Windows Phone 8.0

回答by zumey

回答by hcoverlambda

回答by EminST

相关推荐

C# 如何使用方法参数属性

如何使用 Dapper.NET 将 C# 列表插入数据库

如何在 C# 中向上或向下舍入？

C# 远程服务器返回错误：(550) 文件不可用（制作 ftp 目录时出错）

相关推荐

最近更新

标签