C#:用于解码 Quoted-Printable 编码的类?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2226554/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C#: Class for decoding Quoted-Printable encoding?
提问by Lopper
Is there an existing class in C# that can convert Quoted-Printableencoding to String
? Click on the above link to get more information on the encoding.
C# 中是否有现有的类可以将Quoted-Printable编码转换为String
? 单击上面的链接以获取有关编码的更多信息。
The following is quoted from the above link for your convenience.
为方便起见,以下内容摘自上述链接。
Any 8-bit byte value may be encoded with 3 characters, an "=" followed by two hexadecimal digits (0–9 or A–F) representing the byte's numeric value. For example, a US-ASCII form feed character (decimal value 12) can be represented by "=0C", and a US-ASCII equal sign (decimal value 61) is represented by "=3D". All characters except printable ASCII characters or end of line characters must be encoded in this fashion.
All printable ASCII characters (decimal values between 33 and 126) may be represented by themselves, except "=" (decimal 61).
ASCII tab and space characters, decimal values 9 and 32, may be represented by themselves, except if these characters appear at the end of a line. If one of these characters appears at the end of a line it must be encoded as "=09" (tab) or "=20" (space).
If the data being encoded contains meaningful line breaks, they must be encoded as an ASCII CR LF sequence, not as their original byte values. Conversely if byte values 13 and 10 have meanings other than end of line then they must be encoded as =0D and =0A.
Lines of quoted-printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an "=" at the end of an encoded line, and does not cause a line break in the decoded text.
任何 8 位字节值都可以用 3 个字符编码,一个“=”后跟两个表示字节数值的十六进制数字(0-9 或 A-F)。例如,US-ASCII 换页符(十进制值 12)可以用“=0C”表示,而 US-ASCII 等号(十进制值 61)可以用“=3D”表示。除了可打印的 ASCII 字符或行尾字符之外的所有字符都必须以这种方式编码。
所有可打印的 ASCII 字符(33 到 126 之间的十进制值)都可以由它们自己表示,除了“=”(十进制 61)。
ASCII 制表符和空格字符、十进制值 9 和 32 可以由它们自己表示,除非这些字符出现在行尾。如果这些字符之一出现在行尾,则必须将其编码为“=09”(制表符)或“=20”(空格)。
如果被编码的数据包含有意义的换行符,它们必须编码为 ASCII CR LF 序列,而不是它们的原始字节值。相反,如果字节值 13 和 10 具有除行尾以外的含义,则它们必须编码为 =0D 和 =0A。
带引号的可打印编码数据的行不得超过 76 个字符。为了在不改变编码文本的情况下满足此要求,可以根据需要添加软换行符。软换行符由编码行末尾的“=”组成,不会在解码文本中引起换行符。
采纳答案by Dave
There is functionality in the framework libraries to do this, but it doesn't appear to be cleanly exposed. The implementation is in the internal class System.Net.Mime.QuotedPrintableStream
. This class defines a method called DecodeBytes
which does what you want. The method appears to be used by only one method which is used to decode MIME headers. This method is also internal, but is called fairly directly in a couple of places, e.g., the Attachment.Name
setter. A demonstration:
框架库中有一些功能可以做到这一点,但它似乎没有完全公开。实现在内部类中System.Net.Mime.QuotedPrintableStream
。此类定义了一个方法DecodeBytes
,该方法可以执行您想要的操作。该方法似乎仅由一种用于解码 MIME 标头的方法使用。这个方法也是内部的,但在几个地方被直接调用,例如,Attachment.Name
setter。演示:
using System;
using System.Net.Mail;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Attachment attachment = Attachment.CreateAttachmentFromString("", "=?iso-8859-1?Q?=A1Hola,_se=F1or!?=");
Console.WriteLine(attachment.Name);
}
}
}
Produces the output:
产生输出:
?Hola,_se?or!
?你好,_se?或者!
You may have to do some testing to ensure carriage returns, etc are treated correctly although in a quick test I did they seem to be. However, it may not be wise to rely on this functionality unless your use-case is close enough to decoding of a MIME header string that you don't think it will be broken by any changes made to the library. You might be better off writing your own quoted-printable decoder.
您可能需要进行一些测试以确保正确处理回车等,尽管在快速测试中我似乎确实如此。但是,依赖此功能可能并不明智,除非您的用例足够接近解码 MIME 标头字符串,并且您认为对库所做的任何更改都不会破坏它。您最好编写自己的引用打印解码器。
回答by Martin Murphy
I wrote this up real quick.
我写得真快。
public static string DecodeQuotedPrintables(string input)
{
var occurences = new Regex(@"=[0-9A-H]{2}", RegexOptions.Multiline);
var matches = occurences.Matches(input);
var uniqueMatches = new HashSet<string>(matches);
foreach (string match in uniqueMatches)
{
char hexChar= (char) Convert.ToInt32(match.Substring(1), 16);
input =input.Replace(match, hexChar.ToString());
}
return input.Replace("=\r\n", "");
}
回答by Igor Semkiv
I extended the solution of Martin Murphy and I hope it will work in every case.
我扩展了 Martin Murphy 的解决方案,我希望它适用于所有情况。
private static string DecodeQuotedPrintables(string input, string charSet)
{
if (string.IsNullOrEmpty(charSet))
{
var charSetOccurences = new Regex(@"=\?.*\?Q\?", RegexOptions.IgnoreCase);
var charSetMatches = charSetOccurences.Matches(input);
foreach (Match match in charSetMatches)
{
charSet = match.Groups[0].Value.Replace("=?", "").Replace("?Q?", "");
input = input.Replace(match.Groups[0].Value, "").Replace("?=", "");
}
}
Encoding enc = new ASCIIEncoding();
if (!string.IsNullOrEmpty(charSet))
{
try
{
enc = Encoding.GetEncoding(charSet);
}
catch
{
enc = new ASCIIEncoding();
}
}
//decode iso-8859-[0-9]
var occurences = new Regex(@"=[0-9A-Z]{2}", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
try
{
byte[] b = new byte[] { byte.Parse(match.Groups[0].Value.Substring(1), System.Globalization.NumberStyles.AllowHexSpecifier) };
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
}
catch { }
}
//decode base64String (utf-8?B?)
occurences = new Regex(@"\?utf-8\?B\?.*\?", RegexOptions.IgnoreCase);
matches = occurences.Matches(input);
foreach (Match match in matches)
{
byte[] b = Convert.FromBase64String(match.Groups[0].Value.Replace("?utf-8?B?", "").Replace("?UTF-8?B?", "").Replace("?", ""));
string temp = Encoding.UTF8.GetString(b);
input = input.Replace(match.Groups[0].Value, temp);
}
input = input.Replace("=\r\n", "");
return input;
}
回答by Demented Devil
If you are decoding quoted-printable with UTF-8 encoding you will need to be aware that you cannot decode each quoted-printable sequence one-at-a-time as the others have shown if there are runs of quoted printable characters together.
如果您使用 UTF-8 编码解码带引号的可打印序列,您将需要注意,如果有多个带引号的可打印字符运行在一起,则不能像其他序列那样一次一个地解码每个带引号的可打印序列。
For example - if you have the following sequence =E2=80=99 and decode this using UTF8 one-at-a-time you get three "weird" characters - if you instead build an array of three bytes and convert the three bytes with the UTF8 encoding you get a single aphostrope.
例如 - 如果您有以下序列 =E2=80=99 并使用 UTF8 一次解码一个,您会得到三个“奇怪”的字符 - 如果您改为构建一个包含三个字节的数组并将三个字节转换为UTF8 编码你得到一个单一的 aphostrope。
Obviously if you are using ASCII encoding then one-at-a-time is no problem however decoding runs means your code will work regardless of the text encoder used.
显然,如果您使用 ASCII 编码,那么一次一个没有问题,但是解码运行意味着无论使用何种文本编码器,您的代码都可以工作。
Oh and don't forget =3D is a special case that means you need to decode whatever you have one more time... That is a crazy gotcha!
哦,不要忘记 =3D 是一种特殊情况,这意味着您需要再解码一次……这是一个疯狂的问题!
Hope that helps
希望有帮助
回答by Igor Semkiv
Better solution
更好的解决方案
private static string DecodeQuotedPrintables(string input, string charSet)
{
try
{
enc = Encoding.GetEncoding(CharSet);
}
catch
{
enc = new UTF8Encoding();
}
var occurences = new Regex(@"(=[0-9A-Z]{2}){1,}", RegexOptions.Multiline);
var matches = occurences.Matches(input);
foreach (Match match in matches)
{
try
{
byte[] b = new byte[match.Groups[0].Value.Length / 3];
for (int i = 0; i < match.Groups[0].Value.Length / 3; i++)
{
b[i] = byte.Parse(match.Groups[0].Value.Substring(i * 3 + 1, 2), System.Globalization.NumberStyles.AllowHexSpecifier);
}
char[] hexChar = enc.GetChars(b);
input = input.Replace(match.Groups[0].Value, hexChar[0].ToString());
}
catch
{ ;}
}
input = input.Replace("=\r\n", "").Replace("=\n", "").Replace("?=", "");
return input;
}
回答by Gonzalo Gallotti
This Quoted Printable Decoder works great!
这个引用的可打印解码器效果很好!
public static byte[] FromHex(byte[] hexData)
{
if (hexData == null)
{
throw new ArgumentNullException("hexData");
}
if (hexData.Length < 2 || (hexData.Length / (double)2 != Math.Floor(hexData.Length / (double)2)))
{
throw new Exception("Illegal hex data, hex data must be in two bytes pairs, for example: 0F,FF,A3,... .");
}
MemoryStream retVal = new MemoryStream(hexData.Length / 2);
// Loop hex value pairs
for (int i = 0; i < hexData.Length; i += 2)
{
byte[] hexPairInDecimal = new byte[2];
// We need to convert hex char to decimal number, for example F = 15
for (int h = 0; h < 2; h++)
{
if (((char)hexData[i + h]) == '0')
{
hexPairInDecimal[h] = 0;
}
else if (((char)hexData[i + h]) == '1')
{
hexPairInDecimal[h] = 1;
}
else if (((char)hexData[i + h]) == '2')
{
hexPairInDecimal[h] = 2;
}
else if (((char)hexData[i + h]) == '3')
{
hexPairInDecimal[h] = 3;
}
else if (((char)hexData[i + h]) == '4')
{
hexPairInDecimal[h] = 4;
}
else if (((char)hexData[i + h]) == '5')
{
hexPairInDecimal[h] = 5;
}
else if (((char)hexData[i + h]) == '6')
{
hexPairInDecimal[h] = 6;
}
else if (((char)hexData[i + h]) == '7')
{
hexPairInDecimal[h] = 7;
}
else if (((char)hexData[i + h]) == '8')
{
hexPairInDecimal[h] = 8;
}
else if (((char)hexData[i + h]) == '9')
{
hexPairInDecimal[h] = 9;
}
else if (((char)hexData[i + h]) == 'A' || ((char)hexData[i + h]) == 'a')
{
hexPairInDecimal[h] = 10;
}
else if (((char)hexData[i + h]) == 'B' || ((char)hexData[i + h]) == 'b')
{
hexPairInDecimal[h] = 11;
}
else if (((char)hexData[i + h]) == 'C' || ((char)hexData[i + h]) == 'c')
{
hexPairInDecimal[h] = 12;
}
else if (((char)hexData[i + h]) == 'D' || ((char)hexData[i + h]) == 'd')
{
hexPairInDecimal[h] = 13;
}
else if (((char)hexData[i + h]) == 'E' || ((char)hexData[i + h]) == 'e')
{
hexPairInDecimal[h] = 14;
}
else if (((char)hexData[i + h]) == 'F' || ((char)hexData[i + h]) == 'f')
{
hexPairInDecimal[h] = 15;
}
}
// Join hex 4 bit(left hex cahr) + 4bit(right hex char) in bytes 8 it
retVal.WriteByte((byte)((hexPairInDecimal[0] << 4) | hexPairInDecimal[1]));
}
return retVal.ToArray();
}
public static byte[] QuotedPrintableDecode(byte[] data)
{
if (data == null)
{
throw new ArgumentNullException("data");
}
MemoryStream msRetVal = new MemoryStream();
MemoryStream msSourceStream = new MemoryStream(data);
int b = msSourceStream.ReadByte();
while (b > -1)
{
// Encoded 8-bit byte(=XX) or soft line break(=CRLF)
if (b == '=')
{
byte[] buffer = new byte[2];
int nCount = msSourceStream.Read(buffer, 0, 2);
if (nCount == 2)
{
// Soft line break, line splitted, just skip CRLF
if (buffer[0] == '\r' && buffer[1] == '\n')
{
}
// This must be encoded 8-bit byte
else
{
try
{
msRetVal.Write(FromHex(buffer), 0, 1);
}
catch
{
// Illegal value after =, just leave it as is
msRetVal.WriteByte((byte)'=');
msRetVal.Write(buffer, 0, 2);
}
}
}
// Illegal =, just leave as it is
else
{
msRetVal.Write(buffer, 0, nCount);
}
}
// Just write back all other bytes
else
{
msRetVal.WriteByte((byte)b);
}
// Read next byte
b = msSourceStream.ReadByte();
}
return msRetVal.ToArray();
}
回答by Pizzaboy
The only one that worked for me.
唯一对我有用的。
http://sourceforge.net/apps/trac/syncmldotnet/wiki/Quoted%20Printable
http://sourceforge.net/apps/trac/syncmldotnet/wiki/Quoted%20Printable
If you just need to decode the QPs, pull inside of your code those three functions from the link above:
如果您只需要解码 QP,请从上面的链接中将这三个函数拉入您的代码中:
HexDecoderEvaluator(Match m)
HexDecoder(string line)
Decode(string encodedText)
And then just:
然后只是:
var humanReadable = Decode(myQPString);
Enjoy
享受
回答by Kachalov Sergey
public static string DecodeQuotedPrintables(string input, Encoding encoding)
{
var regex = new Regex(@"\=(?<Symbol>[0-9A-Z]{2})", RegexOptions.Multiline);
var matches = regex.Matches(input);
var bytes = new byte[matches.Count];
for (var i = 0; i < matches.Count; i++)
{
bytes[i] = Convert.ToByte(matches[i].Groups["Symbol"].Value, 16);
}
return encoding.GetString(bytes);
}
回答by iaceian
private string quotedprintable(string data, string encoding)
{
data = data.Replace("=\r\n", "");
for (int position = -1; (position = data.IndexOf("=", position + 1)) != -1;)
{
string leftpart = data.Substring(0, position);
System.Collections.ArrayList hex = new System.Collections.ArrayList();
hex.Add(data.Substring(1 + position, 2));
while (position + 3 < data.Length && data.Substring(position + 3, 1) == "=")
{
position = position + 3;
hex.Add(data.Substring(1 + position, 2));
}
byte[] bytes = new byte[hex.Count];
for (int i = 0; i < hex.Count; i++)
{
bytes[i] = System.Convert.ToByte(new string(((string)hex[i]).ToCharArray()), 16);
}
string equivalent = System.Text.Encoding.GetEncoding(encoding).GetString(bytes);
string rightpart = data.Substring(position + 3);
data = leftpart + equivalent + rightpart;
}
return data;
}
回答by Lee Harris
I was looking for a dynamic solution and spent 2 days trying different solutions. This solution will support Japanese characters and other standard character sets
我正在寻找一个动态解决方案,并花了 2 天时间尝试不同的解决方案。此解决方案将支持日语字符和其他标准字符集
private static string Decode(string input, string bodycharset) {
var i = 0;
var output = new List<byte>();
while (i < input.Length) {
if (input[i] == '=' && input[i + 1] == '\r' && input[i + 2] == '\n') {
//Skip
i += 3;
} else if (input[i] == '=') {
string sHex = input;
sHex = sHex.Substring(i + 1, 2);
int hex = Convert.ToInt32(sHex, 16);
byte b = Convert.ToByte(hex);
output.Add(b);
i += 3;
} else {
output.Add((byte)input[i]);
i++;
}
}
if (String.IsNullOrEmpty(bodycharset))
return Encoding.UTF8.GetString(output.ToArray());
else {
if (String.Compare(bodycharset, "ISO-2022-JP", true) == 0)
return Encoding.GetEncoding("Shift_JIS").GetString(output.ToArray());
else
return Encoding.GetEncoding(bodycharset).GetString(output.ToArray());
}
}
Then you can call the function with
然后你可以调用函数
Decode("=E3=82=AB=E3=82=B9=E3", "utf-8")
This was originally found here
这最初是在这里找到的