C# 如何将 Unicode 转义序列转换为 .NET 字符串中的 Unicode 字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/183907/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 17:00:18  来源:igfitidea点击:

How do I convert Unicode escape sequences to Unicode characters in a .NET string?

c#.netunicode

提问by jr.

Say you've loaded a text file into a string, and you'd like to convert all Unicode escapes into actual Unicode characters inside of the string.

假设您已将一个文本文件加载到一个字符串中,并且您希望将所有 Unicode 转义符转换为字符串内的实际 Unicode 字符。

Example:

例子:

"The following is the top half of an integral character in Unicode '\u2320', and this is the lower half '\U2321'."

“以下是 Unicode '\u2320' 中整数字符的上半部分,这是下半部分 '\U2321'。”

采纳答案by jr.

The answer is simple and works well with strings up to at least several thousand characters.

答案很简单,适用于至少几千个字符的字符串。

Example 1:

示例 1:

Regex  rx = new Regex( @"\[uU]([0-9A-F]{4})" );
result = rx.Replace( result, match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString() );

Example 2:

示例 2:

Regex  rx = new Regex( @"\[uU]([0-9A-F]{4})" );
result = rx.Replace( result, delegate (Match match) { return ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); } );

The first example shows the replacement being made using a lambda expression (C# 3.0) and the second uses a delegate which should work with C# 2.0.

第一个示例显示使用 lambda 表达式 (C# 3.0) 进行的替换,第二个示例使用应与 C# 2.0 一起使用的委托。

To break down what's going on here, first we create a regular expression:

为了分解这里发生的事情,首先我们创建一个正则表达式:

new Regex( @"\[uU]([0-9A-F]{4})" );

Then we call Replace() with the string 'result' and an anonymous method (lambda expression in the first example and the delegate in the second - the delegate could also be a regular method) that converts each regular expression that is found in the string.

然后我们使用字符串 'result' 和一个匿名方法(第一个示例中的 lambda 表达式和第二个示例中的委托 - 委托也可以是一个正则方法)调用 Replace(),该方法转换在字符串中找到的每个正则表达式.

The Unicode escape is processed like this:

Unicode 转义是这样处理的:

((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString(); });

Get the string representing the number part of the escape (skip the first two characters).

获取表示转义数字部分的字符串(跳过前两个字符)。

match.Value.Substring(2)

Parse that string using Int32.Parse() which takes the string and the number format that the Parse() function should expect which in this case is a hex number.

使用 Int32.Parse() 解析该字符串,它接受字符串和 Parse() 函数应该期望的数字格式,在这种情况下是一个十六进制数字。

NumberStyles.HexNumber

Then we cast the resulting number to a Unicode character:

然后我们将结果数字转换为 Unicode 字符:

(char)

And finally we call ToString() on the Unicode character which gives us its string representation which is the value passed back to Replace():

最后我们在 Unicode 字符上调用 ToString() ,它为我们提供了它的字符串表示形式,即传递回 Replace() 的值:

.ToString()

Note: Instead of grabbing the text to be converted with a Substring call you could use the match parameter's GroupCollection, and a subexpressions in the regular expression to capture just the number ('2320'), but that's more complicated and less readable.

注意:您可以使用 match 参数的 GroupCollection 和正则表达式中的子表达式,而不是使用 Substring 调用获取要转换的文本,以仅捕获数字 ('2320'),但这会更复杂且可读性更差。

回答by George Tsiokos

Refactored a little more:

重构了一点:

Regex regex = new Regex (@"\U([0-9A-F]{4})", RegexOptions.IgnoreCase);
string line = "...";
line = regex.Replace (line, match => ((char)int.Parse (match.Groups[1].Value,
  NumberStyles.HexNumber)).ToString ());

回答by Baseem Najjar

I think you better add the small letters to your regular expression. It worked better for me.

我认为您最好将小写字母添加到您的正则表达式中。它对我来说效果更好。

Regex rx = new Regex(@"\[uU]([0-9A-Fa-f]{4})");
result = rx.Replace(result, match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber)).ToString());

回答by Tar?k ?zgün Güner

This is the VB.NET equivalent:

这是 VB.NET 等效项:

Dim rx As New RegularExpressions.Regex("\[uU]([0-9A-Fa-f]{4})")
result = rx.Replace(result, Function(match) CChar(ChrW(Int32.Parse(match.Value.Substring(2), Globalization.NumberStyles.HexNumber))).ToString())