来自 PHP 的电子邮件已破坏主题标头编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4389676/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 12:55:00  来源:igfitidea点击:

Email from PHP has broken Subject header encoding

phpencodingmimeemail-headers

提问by daza166

My PHP script sends email to users and when the email arrives to their mailboxes, the subject line ($subject) has characters like a^£added to the end of my subject text. This is obviously and encoding problem. The email message content itself is fine, just the subject line is broken.

我的 PHP 脚本向用户发送电子邮件,当电子邮件到达他们的邮箱时,主题行 ( $subject)a^£会在主题文本的末尾添加类似字符。这显然是编码问题。电子邮件内容本身很好,只是主题行被破坏了。

I have searched all over but can't find how to encode my subject properly.

我到处搜索,但找不到如何正确编码我的主题

This is my header. Notice that I'm using Content-Typewith charset=utf-8and Content-Transfer-Encoding: 8bit.

这是我的标题。请注意,我使用Content-Typewithcharset=utf-8Content-Transfer-Encoding: 8bit

//set all necessary headers
$headers = "From: $sender_name<$from>\n";
$headers .= "Reply-To: $sender_name<$from>\n";
$headers .= "X-Sender: $sender_name<$from>\n";
$headers .= "X-Mailer: PHP4\n"; //mailer
$headers .= "X-Priority: 3\n"; //1 UrgentMessage, 3 Normal
$headers .= "MIME-Version: 1.0\n";
$headers .= "X-MSMail-Priority: High\n";
$headers .= "Importance: 3\n";
$headers .= "Date: $date\n";
$headers .= "Delivered-to: $to\n";
$headers .= "Return-Path: $sender_name<$from>\n";
$headers .= "Envelope-from: $sender_name<$from>\n";
$headers .= "Content-Transfer-Encoding: 8bit\n";
$headers .= "Content-Type: text/plain; charset=UTF-8\n";

回答by Gumbo

Update???For a more practical and up-to-date answer, have a look at Palec's answer.

更新???有关更实用和最新的答案,请查看Palec 的答案



The specified character encoding in Content-Typedoes only describe the character encoding of the message body but not the header. You need to use the encoded-wordsyntaxwith either the quoted-printableencodingor the Base64 encoding:

Content-Type中指定的字符编码只描述消息体的字符编码,而不描述消息头的字符编码。您需要将编码字语法与带引号的可打印编码Base64 编码一起使用

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

You can use imap_8bitfor the quoted-printableencoding and base64_encodefor the Base64 encoding:

您可以使用imap_8bit引号的可打印编码和base64_encodeBase64 编码:

"Subject: =?UTF-8?B?".base64_encode($subject)."?="
"Subject: =?UTF-8?Q?".imap_8bit($subject)."?="

回答by Palec

TL;DR

TL; 博士

$preferences = ['input-charset' => 'UTF-8', 'output-charset' => 'UTF-8'];
$encoded_subject = iconv_mime_encode('Subject', $subject, $preferences);
$encoded_subject = substr($encoded_subject, strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);

or

或者

mb_internal_encoding('UTF-8');
$encoded_subject = mb_encode_mimeheader($subject, 'UTF-8', 'B', "\r\n", strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);

Problem and solution

问题及解决方案

The Content-Typeand Content-Transfer-Encodingheaders apply only to the body of your message. For headers, there is a mechanism for specifying their encoding specified in RFC 2047.

Content-TypeContent-Transfer-Encoding头仅适用于您的邮件的正文。对于标头,有一种机制可以指定RFC 2047 中指定的编码。

You should encode your Subjectvia iconv_mime_encode(), which exists as of PHP 5:

你应该对你的Subjectvia进行编码iconv_mime_encode(),它从 PHP 5 开始就存在:

$preferences = ["input-charset" => "UTF-8", "output-charset" => "UTF-8"];
$encoded_subject = iconv_mime_encode("Subject", $subject, $preferences);

Change input-charsetto match the encoding of your string $subject. You should leave output-charsetas UTF-8. Before PHP 5.4, use array()instead of [].

更改input-charset以匹配您的字符串的编码$subject。你应该离开output-charsetUTF-8。在 PHP 5.4 之前,使用array()代替[].

Now $encoded_subjectis (without trailing newline)

现在$encoded_subject是(没有尾随换行符)

Subject: =?UTF-8?B?VmVyeSBsb25nIHRleHQgY29udGFpbmluZyBzcGVjaWFsIGM=?=
 =?UTF-8?B?aGFyYWN0ZXJzIGxpa2UgxJvFocSNxZnFvsO9w6HDrcOpPD4/PSsqIHA=?=
 =?UTF-8?B?cm9kdWNlcyBzZXZlcmFsIGVuY29kZWQtd29yZHMsIHNwYW5uaW5nIG0=?=
 =?UTF-8?B?dWx0aXBsZSBsaW5lcw==?=

for $subjectcontaining:

用于$subject包含:

Very long text containing special characters like ě????yáíé<>?=+* produces several encoded-words, spanning multiple lines

How does it work?

它是如何工作的?

The iconv_mime_encode()function splits the text, encodes each piece separately into an <encoded-word>token and foldsthe whitespace between them. Encoded word is =?<charset>?<encoding>?<encoded-text>?=where:

iconv_mime_encode()函数拆分文本,将每个部分分别编码为一个<encoded-word>标记并折叠它们之间的空白。编码字是=?<charset>?<encoding>?<encoded-text>?=

You can decode =?CP1250?B?QWhvaiwgc3bsdGU=?=into UTF-8 string Ahoj, světe(Hello, worldin Czech) via iconv("CP1250", "UTF-8", base64_decode("QWhvaiwgc3bsdGU="))or directly via iconv_mime_decode("=?CP1250?B?QWhvaiwgc3bsdGU=?=", 0, "UTF-8").

您可以通过或直接通过解码=?CP1250?B?QWhvaiwgc3bsdGU=?=为 UTF-8 字符串Ahoj, světeHello, world捷克语)。iconv("CP1250", "UTF-8", base64_decode("QWhvaiwgc3bsdGU="))iconv_mime_decode("=?CP1250?B?QWhvaiwgc3bsdGU=?=", 0, "UTF-8")

Encoding into encoded words is more complicated, because the spec requires each encoded-word token to be at most 75 bytes long and each line containing any encoded-word token must be at most 76 bytes long (including blank at the start of a continuation line). Don't implement the encoding yourself. All you really need to know is that iconv_mime_encode()respects the spec.

编码为编码字更复杂,因为规范要求每个编码字令牌最多 75 个字节长,并且包含任何编码字令牌的每一行必须最多 76 个字节长(包括连续行开头的空白) )。不要自己实现编码。您真正需要知道的是iconv_mime_encode()尊重规范。

Interesting related reading is the Wikipedia article Unicode and email.

有趣的相关阅读是维基百科文章Unicode 和电子邮件

Alternatives

备择方案

A rudimentary option is to use only a restricted set of characters. ASCII is guaranteed to work. ISO Latin 1 (ISO-8859-1), as user2250504 suggested, will probably work too, because it is often used as fallback when no encoding is specified. But those character sets are very small and you'll probably be unable to encode all the characters you'll want. Moreover, the RFCs say nothing about whether Latin 1 should work or not.

一个基本的选择是只使用一组受限制的字符。ASCII 保证有效。正如user2250504 建议的那样,ISO Latin 1 (ISO-8859-1)也可能会起作用,因为它通常在未指定编码时用作后备。但是这些字符集非常小,您可能无法编码您想要的所有字符。此外,RFC 没有说明拉丁语 1 是否应该工作。

You can also use mb_encode_mimeheader(), as Paul Norman answered, but it's easy to use it incorrectly.

您也可以使用mb_encode_mimeheader(),正如保罗诺曼回答的那样,但很容易错误地使用它。

  1. You must use mb_internal_encoding()to set the mbstring functions' internally used encoding. The mb_*functions expect input strings to be in this encoding. Beware: The second parameter of mb_encode_mimeheader()has nothing to do with the input string (despite what the manual says). It corresponds to the <charset>in the encoded word (see How does it work?above). The input string is recoded from the internal encoding to this one before being passed to the B or Q encoding.

    Setting internal encoding might not be needed since PHP 5.6, because the underlying mbstring.internal_encodingconfiguration option had been deprecated in favor of the default_charsetoption, which has been set to UTF-8 by default, since. Note that this is just a default and it may be inappropriate to rely on defaults in your code.

  2. You must include the header name and colon in the input string. The RFC imposes a strong limit on line length and it must hold for the first line, too! An alternative is to fiddle with the fifth parameter ($indent; last one as of September 2015), but this is even less convenient.

  3. The implementation might have bugs. Even if used correctly, you might get broken output. At least this is what many comments on the manual page say. I have not managed to find any problem, but I know implementation of encoded words is tricky. If you find potential or actual bugs in mb_encode_mimeheader()or iconv_mime_encode(), please, let me know in the comments.

  1. 您必须使用mb_internal_encoding()来设置 mbstring 函数的内部使用编码。该mb_*功能预计输入字符串使用这种编码。当心:第二个参数mb_encode_mimeheader()与输入字符串无关(尽管手册上是这样说的)。它对应<charset>于编码字中的 (参见上面的如何工作?)。在传递给 B 或 Q 编码之前,输入字符串从内部编码重新编码为这个编码。

    自 PHP 5.6 起,可能不需要设置内部编码,因为底层mbstring.internal_encoding配置选项已被弃用,取而代之的是该default_charset选项,默认情况下已设置为 UTF-8,因为。请注意,这只是一个默认值,依赖代码中的默认值可能是不合适的。

  2. 您必须在输入字符串中包含标题名称和冒号。RFC 对行长度施加了严格的限制,第一行也必须如此!另一种方法是摆弄第五个参数($indent; 截至 2015 年 9 月的最后一个),但这更不方便。

  3. 实现可能有错误。即使正确使用,您也可能会得到损坏的输出。至少这是手册页上的许多评论所说的。我没有找到任何问题,但我知道编码字的实现很棘手。如果你在寻找潜在的或实际的错误mb_encode_mimeheader()iconv_mime_encode(),请让我知道在评论。

There is also at least one upside to using mb_encode_mimeheader(): it does not always encode all the header contents, which saves space and leaves the text human-readable. The encoding is required only for the non-ASCII parts. The output analogous to the iconv_mime_encode()example above is:

使用至少还有一个好处mb_encode_mimeheader():它并不总是对所有标题内容进行编码,这样可以节省空间并使文本易于阅读。只有非 ASCII 部分需要编码。类似于iconv_mime_encode()上面例子的输出是:

Subject: Very long text containing special characters like
 =?UTF-8?B?xJvFocSNxZnFvsO9w6HDrcOpPD4/PSsqIHByb2R1Y2VzIHNldmVyYWwgZW5j?=
 =?UTF-8?B?b2RlZC13b3Jkcywgc3Bhbm5pbmcgbXVsdGlwbGUgbGluZXM=?=

Usage example of mb_encode_mimeheader():

的用法示例mb_encode_mimeheader()

mb_internal_encoding('UTF-8');
$encoded_subject = mb_encode_mimeheader("Subject: $subject", 'UTF-8');
$encoded_subject = substr($encoded_subject, strlen('Subject: '));
mail($to, $encoded_subject, $message, $headers);

This is an alternative to the snippet in TL;DR on top of this post. Instead of just reserving the space for Subject:, it actually puts it there and then removes it in order to be able to use it with the mail()'s stupid interface.

这是本文顶部 TL;DR 中的片段的替代方法。它不是仅仅为 保留空间Subject:,而是实际上将它放在那里,然后将其删除,以便能够将它与mail()的愚蠢界面一起使用。

If you like mbstring functions better than the iconv ones, you might want to use mb_send_mail(). It uses mail()internally, but encodes subject and body of the message automatically. Again, use with care.

如果你比 iconv 更喜欢 mbstring 函数,你可能想要使用mb_send_mail(). 它在mail()内部使用,但会自动对消息的主题和正文进行编码。再次,小心使用

Headers other than Subject need different treatment

主题以外的标题需要不同的处理

Note that you must not assume that encoding the whole contents of a header is OK for all headers that may contain non-ASCII characters. E.g. From, To, Cc, Bcc and Reply-To may contain names for the addresses they contain, but only the names may be encoded, not the addresses. The reason is that <encoded-word>token may replace just <text>, <ctext>and <word>tokens, and only under certain circumstances (see §5 of RFC 2047).

请注意,对于可能包含非 ASCII 字符的所有标头,您不能假设对标头的全部内容进行编码是可以的。例如,From、To、Cc、Bcc 和 Reply-To 可能包含它们所包含的地址的名称,但只能对名称进行编码,而不能对地址进行编码。其原因是,<encoded-word>令牌可以只更换<text><ctext><word>令牌,只有在特定情况下(见§5RFC 2047)。

Encoding of non-ASCII text in other headers is a related but different question. If you wish to know more about this topic, search. If you find no answer, ask another question and point me to it in the comments.

在其他标题中编码非 ASCII 文本是一个相关但不同的问题。如果您想了解有关此主题的更多信息,请搜索。如果您找不到答案,请提出另一个问题,并在评论中指出。

回答by Paul Norman

mb_encode_mimeheader()for UTF-8 strings can be useful here, e.g.

用于 UTF-8 字符串的mb_encode_mimeheader()在这里很有用,例如

$subject = mb_encode_mimeheader($subjectText,"UTF-8");

回答by user2250504

Save the php file with the appropriate charset.

使用适当的字符集保存 php 文件。

In my case, in Sublime Text, I used the following option:

就我而言,在 Sublime Text 中,我使用了以下选项:

File > Save with Encoding > Western (ISO-8859-1) [for Brazilian Portuguese]

文件 > 使用编码保存 > 西方 (ISO-8859-1) [巴西葡萄牙语]

Doing this, you don't need to use any command.

这样做,您不需要使用任何命令。