在 VB.NET 中将 UTF-8 转换为 ASCII

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/569725/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 14:04:34  来源:igfitidea点击:

Converting UTF-8 to ASCII in VB.NET

vb.netcharacter-encoding

提问by Leandro López

I am writing a console application, which reads emails from different email boxes and processes through them. Emails are received from various automated systems. The email messages are logged and/or sent forward.

我正在编写一个控制台应用程序,它从不同的邮箱读取电子邮件并通过它们进行处理。从各种自动化系统接收电子邮件。电子邮件消息被记录和/或转发。

The problem is that some emails are encoded in UTF-8 and transfer-encoded in quoted-printable which messes up special characters (mainly ?,? and ?). I have not found any solution to convert them in readable format.

问题是一些电子邮件以 UTF-8 编码并以引用打印的传输编码,这会弄乱特殊字符(主要是 ?、? 和 ?)。我还没有找到任何将它们转换为可读格式的解决方案。

For example "?" in quoted-printable is "=C3=A4". Using a normal conversion methods the result is "?¤" (gibberish).

例如 ”?” 在quoted-printable中是“=C3=A4”。使用正常的转换方法,结果是“?¤”(胡言乱语)。

I shamelessly ripped this example conversion table from here: http://forums.sun.com/thread.jspa?threadID=5315363

我无耻地从这里撕下了这个示例转换表:http: //forums.sun.com/thread.jspa?threadID=5315363

char   codepoint          UTF-8 encoding                 as Latin-1

?      11100100 = E4      11000011 10100100 = C3 A4      ?¤ = \u00C3\u00A4
?      11100101 = E5      11000011 10100101 = C3 A5      ?¥ = \u00C3\u00A5
?      11110110 = F6      11000011 10110110 = C3 B6      ?? = \u00C3\u00B6

?      11000100 = C4      11000011 10000100 = C3 84      ?? = \u00C3\u0084
?      11000101 = C5      11000011 10000101 = C3 85      ?? = \u00C3\u0085
?      11010110 = D6      11000011 10010110 = C3 96      ?? = \u00C3\u0096

So how do I get the real codepoint from UTF-8 value? I'd rather not use any external libraries. Besides I've tried a couple already and they failed.

那么如何从 UTF-8 值中获取真正的代码点呢?我宁愿不使用任何外部库。此外,我已经尝试了几个,但都失败了。

采纳答案by Douglas Leeder

From the effects you describe, I guess you get the emails by directly connectiong to POP3 mail boxes? If so, then you get the emails in their rawform and most of those mails will most probably be in the MIME format.

从你描述的效果来看,我猜你是直接连接POP3邮箱收到邮件的?如果是这样,那么您将收到原始格式的电子邮件,并且这些邮件中的大部分很可能是 MIME 格式。

MIME (Wikipediahas a good overview) is a rather large and complex standard and implementing a MIME parser that reliably handles all the cases you want to have covered could very well take you a few weeks.

MIME(维基百科有一个很好的概述)是一个相当大和复杂的标准,实现一个可靠地处理你想要涵盖的所有情况的 MIME 解析器可能需要几周时间。

I'd therefore consider using a thrid-party MIME library that does the job for you.

因此,我会考虑使用为您完成这项工作的第三方 MIME 库。

回答by Leandro López

I'm not completely sure, but this might do the trick:

我不完全确定,但这可能会奏效:

Encoding.ASCII.GetString(Encoding.UTF8.GetBytes(yourString))

I'm not on my computer right now so I can't test it, but I'll try it later.

我现在不在我的电脑上,所以我无法测试它,但我稍后会尝试。

回答by Douglas Leeder

You need to convert from UTF-8 to Latin1 - after doing the quoted-printable conversion.

您需要从 UTF-8 转换为 Latin1 - 在进行引用可打印转换之后。

http://msdn.microsoft.com/en-us/library/66sschk1.aspxlooks promising.

http://msdn.microsoft.com/en-us/library/66sschk1.aspx看起来很有希望。