C# 如何将 Unicode 字符转换为其等效的 ASCII 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/138449/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 15:08:38  来源:igfitidea点击:

How to convert a Unicode character to its ASCII equivalent

提问by Huppie

Here's the problem:

这是问题所在:

In C# I'm getting information from a legacy ACCESS database. .NET converts the content of the database (in the case of this problem a string) to Unicode before handing the content to me.

在 C# 中,我从旧版 ACCESS 数据库中获取信息。在将内容交给我之前,.NET 将数据库的内容(在此问题的情况下为字符串)转换为 Unicode。

How do I convert this Unicode string back to it's ASCII equivalent?

如何将此 Unicode 字符串转换回它的 ASCII 等效字符串?



Edit编辑


Unicode char 710 确实是 MODIFIER LETTER CIRCUMFLEX ACCENT。这是更精确的问题:

 -> (Extended) ASCII character ê (Extended ASCII 136) was inserted in the database.
 -> Either Access or the reading component in .NET converted this to U+02C6 U+0065
    (MODIFIER LETTER CIRCUMFLEX ACCENT + LATIN SMALL LETTER E)
 -> I need the (Extended) ASCII character 136 back.



这是我尝试过的(我现在明白为什么这不起作用......):

string myInput = Convert.ToString(Convert.ToChar(710));
byte[] asBytes = Encoding.ASCII.GetBytes(myInput);

But this does not result in 94 but a byte with value 63...
Here's a new try but it still does not work:

但这不会导致 94 而是一个值为 63 的字节......
这是一个新的尝试,但它仍然不起作用:

byte[] bytes = Encoding.ASCII.GetBytes("ê");



Soltution解决方案


感谢两者csgerocsgerobzlmbzlm指向正确的方向我解决了这个问题here在这里

采纳答案by Huppie

Okay, let's elaborate. Both csgeroand bzlmpointed in the right direction.

好吧,让我们详细说明一下。无论csgerobzlm在正确的方向。

Because of blzm's reply I looked up the Windows-1252 page on wiki and found that it's called a codepage. The wikipedia article for Code pagewhich stated the following:

由于 blzm 的回复,我在 wiki 上查找了 Windows-1252 页面,发现它被称为代码页。代码页的维基百科文章说明如下:

No formal standard existed for these ‘extended character sets'; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings.

这些“扩展字符集”不存在正式标准;IBM 只是将这些变体称为代码页,就像它一直对 EBCDIC 编码的变体所做的那样。

This led me to codepage 437:

这让我看到了代码页 437:

n ASCII-compatible code pages, the lower 128 characters maintained their standard US-ASCII values, and different pages (or sets of characters) could be made available in the upper 128 characters. DOS computers built for the North American market, for example, used code page 437, which included accented characters needed for French, German, and a few other European languages, as well as some graphical line-drawing characters.

n 与 ASCII 兼容的代码页,低 128 个字符保持其标准 US-ASCII 值,不同的页面(或字符集)可以在高 128 个字符中使用。例如,为北美市场构建的 DOS 计算机使用代码页 437,其中包括法语、德语和其他一些欧洲语言所需的重音字符,以及一些图形画线字符。

So, codepage 437 was the codepage I was calling 'extended ASCII', it had the ê as character 136 so I looked up some other chars as well and they seem right.

所以,代码页 437 是我称之为“扩展 ASCII”的代码页,它有 ê 作为字符 136,所以我也查找了一些其他字符,它们看起来是正确的。

csgero came with the Encoding.GetEncoding() hint, I used it to create the following statement which solves my problem:

csgero 带有 Encoding.GetEncoding() 提示,我用它来创建以下语句来解决我的问题:

byte[] bytes = Encoding.GetEncoding(437).GetBytes("ê");

回答by Konrad Rudolph

Hmm?… I'm not sure which character you mean. The caret (“^”, CIRCUMFLEX ACCENT) has the same code in ASCII and Unicode (U+005E).

嗯?...我不确定你指的是哪个角色。插入符号(“^”,CIRCUMFLEX ACCENT)在 ASCII 和 Unicode (U+005E) 中具有相同的代码。

/EDIT: Damn, my fault. 710 (U+02C6) is actually the MODIFIER LETTER CIRCUMFLEX ACCENT. Unfortunately, this character isn't part of ASCII at all. It might look like the normal caret but it's a different character. Simple conversion won't help here. I'm not sure if .NET supports mapping of similar characters when converting from Unicode. Worth investigating, though.

/编辑:该死,我的错。710 (U+02C6) 实际上是修饰符字母 CIRCUMFLEX ACCENT。不幸的是,这个字符根本不是 ASCII 的一部分。它可能看起来像普通的插入符号,但它是一个不同的字符。简单的转换在这里无济于事。我不确定 .NET 在从 Unicode 转换时是否支持类似字符的映射。不过值得研究。

回答by Timbo

The value 63 is the question mark, AKA "I am not able to display this character in ASCII".

值 63 是问号,又名“我无法以 ASCII 显示此字符”。

回答by csgero

You cannot use the default ASCII encoding (Encoding.ASCII) here, but must create the encoding with the appropriate code page using Encoding.GetEncoding(...). You might try to use code page 1252, which is a superset of ISO 8859-1.

您不能在此处使用默认 ASCII 编码 (Encoding.ASCII),但必须使用 Encoding.GetEncoding(...) 使用适当的代码页创建编码。您可以尝试使用代码页 1252,它是 ISO 8859-1 的超集。

回答by bzlm

ASCII does not define ê; the number 136 comes from the number for the circumflex in 8-bit encodings such as Windows-1252.

ASCII 没有定义ê;数字 136 来自 8 位编码(如 Windows-1252)中的抑扬符数字。

Can you verify that a small e with a circumflex (ê) is actually what is supposed to be stored in the Access database in this case? Perhaps U+02C6 U+0065 is the result of a conversion error, where the input is actually an e followed bya circumflex, or something else entirely. Perhaps your Access database has corrupt data in the sense that the designated encoding does not match the contents, in which case the .NET client might incorrectly parse the data (using the wrong decoder).

在这种情况下,您能否验证带有抑扬符 (ê) 的小 e 实际上应该存储在 Access 数据库中?也许 U+02C6 U+0065 是转换错误的结果,其中输入实际上是一个 e后跟一个抑扬符,或其他完全不同的东西。也许您的 Access 数据库具有损坏的数据,因为指定的编码与内容不匹配,在这种情况下,.NET 客户端可能会错误地解析数据(使用错误的解码器)。

If this error is indeed introduced during the reading from the database, perhaps pasting some code or configuration settings might help.

如果在从数据库读取期间确实引入了此错误,则粘贴一些代码或配置设置可能会有所帮助。

In Code page 437, character number 136 is an e with a circumflex.

代码页 437 中,字符号 136 是一个带有扬抑符的 e。