在 VB.NET 中将 UTF-8 转换为 ASCII

Question

提问by Leandro López

I am writing a console application, which reads emails from different email boxes and processes through them. Emails are received from various automated systems. The email messages are logged and/or sent forward.

我正在编写一个控制台应用程序，它从不同的邮箱读取电子邮件并通过它们进行处理。从各种自动化系统接收电子邮件。电子邮件消息被记录和/或转发。

The problem is that some emails are encoded in UTF-8 and transfer-encoded in quoted-printable which messes up special characters (mainly ?,? and ?). I have not found any solution to convert them in readable format.

问题是一些电子邮件以 UTF-8 编码并以引用打印的传输编码，这会弄乱特殊字符（主要是 ?、? 和 ?）。我还没有找到任何将它们转换为可读格式的解决方案。

For example "?" in quoted-printable is "=C3=A4". Using a normal conversion methods the result is "?¤" (gibberish).

例如 ”？” 在quoted-printable中是“=C3=A4”。使用正常的转换方法，结果是“?¤”（胡言乱语）。

I shamelessly ripped this example conversion table from here: http://forums.sun.com/thread.jspa?threadID=5315363

我无耻地从这里撕下了这个示例转换表：http: //forums.sun.com/thread.jspa?threadID=5315363

char   codepoint          UTF-8 encoding                 as Latin-1

?      11100100 = E4      11000011 10100100 = C3 A4      ?¤ = \u00C3\u00A4
?      11100101 = E5      11000011 10100101 = C3 A5      ?￥ = \u00C3\u00A5
?      11110110 = F6      11000011 10110110 = C3 B6      ?? = \u00C3\u00B6

?      11000100 = C4      11000011 10000100 = C3 84      ?? = \u00C3\u0084
?      11000101 = C5      11000011 10000101 = C3 85      ?? = \u00C3\u0085
?      11010110 = D6      11000011 10010110 = C3 96      ?? = \u00C3\u0096

So how do I get the real codepoint from UTF-8 value? I'd rather not use any external libraries. Besides I've tried a couple already and they failed.

那么如何从 UTF-8 值中获取真正的代码点呢？我宁愿不使用任何外部库。此外，我已经尝试了几个，但都失败了。

Answer 1

采纳答案by Douglas Leeder

From the effects you describe, I guess you get the emails by directly connectiong to POP3 mail boxes? If so, then you get the emails in their rawform and most of those mails will most probably be in the MIME format.

从你描述的效果来看，我猜你是直接连接POP3邮箱收到邮件的？如果是这样，那么您将收到原始格式的电子邮件，并且这些邮件中的大部分很可能是 MIME 格式。

MIME (Wikipediahas a good overview) is a rather large and complex standard and implementing a MIME parser that reliably handles all the cases you want to have covered could very well take you a few weeks.

MIME（维基百科有一个很好的概述）是一个相当大和复杂的标准，实现一个可靠地处理你想要涵盖的所有情况的 MIME 解析器可能需要几周时间。

I'd therefore consider using a thrid-party MIME library that does the job for you.

因此，我会考虑使用为您完成这项工作的第三方 MIME 库。

Answer 2

回答by Leandro López

I'm not completely sure, but this might do the trick:

我不完全确定，但这可能会奏效：

Encoding.ASCII.GetString(Encoding.UTF8.GetBytes(yourString))

I'm not on my computer right now so I can't test it, but I'll try it later.

我现在不在我的电脑上，所以我无法测试它，但我稍后会尝试。

Answer 3

回答by Douglas Leeder

You need to convert from UTF-8 to Latin1 - after doing the quoted-printable conversion.

您需要从 UTF-8 转换为 Latin1 - 在进行引用可打印转换之后。

http://msdn.microsoft.com/en-us/library/66sschk1.aspxlooks promising.

http://msdn.microsoft.com/en-us/library/66sschk1.aspx看起来很有希望。

在 VB.NET 中将 UTF-8 转换为 ASCII

提问by Leandro López

采纳答案by Douglas Leeder

回答by Leandro López

回答by Douglas Leeder

相关推荐

最近更新

标签

在 VB.NET 中将 UTF-8 转换为 ASCII

提问by Leandro López

采纳答案by Douglas Leeder

回答by Leandro López

回答by Douglas Leeder

相关推荐

vb.net 为什么使用 TryCast 而不是 DirectCast？

vb.net 你如何确定一个字符是否是来自 AZ 的字母？

vb.net 确定当前时间是上午还是下午的最简单方法？

vb.net Visual Basic .net 中图像的透明度？

相关推荐

最近更新

标签