PHP 文件中的 UTF-8 BOM 签名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2558172/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 06:56:12  来源:igfitidea点击:

UTF-8 BOM signature in PHP files

phputf-8character-encodingbyte-order-mark

提问by treznik

I was writing some commented PHP classes and I stumbled upon a problem. My name (for the @author tag) ends up with a ?(which is a UTF-8 character, ...and a strange name, I know).

我正在编写一些带注释的 PHP 类,但偶然发现了一个问题。我的名字(对于@author 标签)以 a ?(这是一个 UTF-8 字符,......还有一个奇怪的名字,我知道)结束。

Even though I save the file as UTF-8, some friends reported that they see that character totally messed up (è?). This problem goes away by adding the BOM signature. But that thing troubles me a bit, since I don't know that much about it, except from what I saw on Wikipedia and on some other similar questions here on SO.

尽管我将文件保存为 UTF-8,但一些朋友报告说他们看到该字符完全混乱 ( è?)。通过添加 BOM 签名,这个问题就消失了。但是那件事让我有点困扰,因为我对此知之甚少,除了我在 Wikipedia 上看到的内容以及 SO 上的其他一些类似问题。

I know that it adds some things at the beginning of the file, and from what I understood it's not that bad, but I'm concerned because the only problematic scenarios I read about involved PHP files. And since I'm writing PHP classes to share them, being 100% compatible is more important than having my name in the comments.

我知道它在文件的开头添加了一些东西,据我所知,它并没有那么糟糕,但我很担心,因为我读到的唯一有问题的场景涉及 PHP 文件。由于我正在编写 PHP 类来共享它们,因此 100% 兼容比在评论中显示我的名字更重要。

But I'm trying to understand the implications, should I use it without worrying? or are there cases when it might cause damage? When?

但是我正在尝试了解其含义,我应该使用它而不必担心吗?或者是否有可能造成损坏的情况?什么时候?

回答by skrebbel

Indeed, the BOM is actual data sent to the browser. The browser will happily ignore it, but still you cannot send headers then.

实际上,BOM 是发送到浏览器的实际数据。浏览器会很乐意忽略它,但是您仍然无法发送标头。

I believe the problem really is your and your friend's editor settings. Without a BOM, your friend's editor may not automatically recognize the file as UTF-8. He can try to set up his editor such that the editor expectsa file to be in UTF-8 (if you use a real IDE such as NetBeans, then this can even be made a project setting that you can transfer along with the code).

我相信问题确实出在您和您朋友的编辑器设置上。如果没有 BOM,您朋友的编辑器可能不会自动将文件识别为 UTF-8。他可以尝试设置他的编辑器,以便编辑器期望文件为 UTF-8(如果您使用真实的 IDE,例如 NetBeans,那么这甚至可以进行项目设置,您可以将其与代码一起传输) .

An alternative is to try some tricks: some editors try to determine the encoding using some heuristics based on the entered text. You could try to start each file with

另一种方法是尝试一些技巧:一些编辑器尝试根据输入的文本使用一些启发式方法来确定编码。你可以尝试用

<?php //úτ?-8 encoded

and maybe the heuristic will get it. There's probably better stuff to put there, and you can either google for what kind of encoding detection heuristics are common, or just try some out :-)

也许启发式会得到它。可能有更好的东西可以放在那里,你可以谷歌搜索什么样的编码检测启发式是常见的,或者只是尝试一些:-)

All in all, I recommend just fixing the editor settings.

总而言之,我建议只修复编辑器设置。

Oh wait, I misread the last part: for spreading the code to anywhere, I guess you're safest just making all files only contain the lower 7-bit characters, i.e. plain ASCII, or to just accept that some people with ancient editors see your name written funny. There is no fail-safe way. The BOM is definitely bad because of the headers already sent thing. On the other side, as long as you only put UTF-8 characters in comments and so, the only impact of some editor misunderstanding the encoding is weird characters. I'd go for correctly spelling your name and adding a comment targeted at heuristics so that most editors will get it, but there will always be people who'll see bogus chars instead.

哦等等,我误读了最后一部分:为了将代码传播到任何地方,我想你最安全的方法是让所有文件只包含低 7 位字符,即纯 ASCII,或者只是接受一些使用古代编辑器的人看到的你的名字写得很有趣。没有万无一失的方法。由于标题已经发送,BOM 肯定是坏的。另一方面,只要你只在注释中放 UTF-8 字符等等,一些编辑误解编码的唯一影响就是奇怪的字符。我会去正确拼写你的名字并添加一个针对启发式的评论,这样大多数编辑都会得到它,但总会有人看到假字符。

回答by Your Common Sense

BOM would cause Headers already senterror, so, you can't use BOM in PHP files

BOM 会导致Headers already sent错误,因此,您不能在 PHP 文件中使用 BOM

回答by omabena

This is an old post and have already been answered, but i can leave you some others resources that i found when i faced with this BOM issue.

这是一篇旧帖子,已经得到了回答,但我可以给你留下一些我在面对这个 BOM 问题时发现的其他资源。

http://people.w3.org/rishida/utils/bomtester/index.phpwith this page you can check if a specific file contains BOM.

http://people.w3.org/rishida/utils/bomtester/index.php通过此页面,您可以检查特定文件是否包含 BOM。

There is also a handy script that outputs all files with BOM on your current directory.

还有一个方便的脚本可以输出当前目录中所有带有 BOM 的文件。

<?php 
function fopen_utf8 ($filename) { 
    $file = @fopen($filename, "r"); 
    $bom = fread($file, 3); 
    if ($bom != b"\xEF\xBB\xBF") 
    { 
        return false; 
    } 
    else 
    { 
        return true; 
    } 
} 

function file_array($path, $exclude = ".|..|design", $recursive = true) { 
    $path = rtrim($path, "/") . "/"; 
    $folder_handle = opendir($path); 
    $exclude_array = explode("|", $exclude); 
    $result = array(); 
    while(false !== ($filename = readdir($folder_handle))) { 
        if(!in_array(strtolower($filename), $exclude_array)) { 
            if(is_dir($path . $filename . "/")) { 
                                // Need to include full "path" or it's an infinite loop 
                if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true); 
            } else { 
                if ( fopen_utf8($path . $filename) ) 
                { 
                    //$result[] = $filename; 
                    echo ($path . $filename . "<br>"); 
                } 
            } 
        } 
    } 
    return $result; 
} 

$files = file_array("."); 
?>

I found that code at php.net

我在 php.net 上找到了那个代码

Dreamweaver also helps with this, it gives you the option to save the file and not include the BOM stuff

Dreamweaver 对此也有帮助,它让您可以选择保存文件而不包含 BOM 内容

Its a late answer, but i still hope it helps. Bye

这是一个迟到的答案,但我仍然希望它有所帮助。再见

回答by solarc

Just so you know, there's an option in php, zend.multibyte, which allows php to read files with BOM without giving the Headers already senterror.

只是你知道,在 php, 中有一个选项zend.multibyte,它允许 php 读取带有 BOM 的文件而不会给出Headers already sent错误。

From the php.ini file:

从 php.ini 文件:

; If enabled, scripts may be written in encodings that are incompatible with
; the scanner.  CP936, Big5, CP949 and Shift_JIS are the examples of such
; encodings.  To use this feature, mbstring extension must be enabled.
; Default: Off
;zend.multibyte = Off

回答by matthewv789

In PHP, in addition to the "headers already sent" error, the presence of a BOM can also screw up the HTML in the browser in more subtle ways.

在 PHP 中,除了“headers already sent”错误之外,BOM 的存在还会以更微妙的方式破坏浏览器中的 HTML。

See this linkfor an outline of the problem.

有关问题的概述,请参阅此链接

When this occurs, not only is there usually a noticeable space at the top of the rendered page, but if you inspect the HTML in Firefox or Chrome, you may notice that the head section is empty and its elements appear to be in the body. Of course viewing source will show everything where it should be, but somehow the browser is interpreting it wrong.

发生这种情况时,不仅渲染页面顶部通常会有明显的空间,而且如果您在 Firefox 或 Chrome 中检查 HTML,您可能会注意到 head 部分是空的,其元素似乎在 body 中。当然,查看源代码将显示所有内容,但不知何故浏览器将其解释为错误。

回答by peufeu

Or you could activate output buffering in php.ini which will solve the "headers already sent" problem. It is also very important to use output buffering for performance if your site has significant load.

或者您可以在 php.ini 中激活输出缓冲,这将解决“标头已发送”问题。如果您的站点负载很大,那么使用输出缓冲来提高性能也非常重要。

回答by Szabolcs Páll

BOM is actually the most efficient way of identifying an UTF-8 file, and both modern browsers and standards support and encourage the use of it in HTTP response bodies.

BOM 实际上是识别 UTF-8 文件的最有效方式,现代浏览器和标准都支持并鼓励在 HTTP 响应正文中使用它。

In case of PHP files its not the file but the generated output that gets sent as response so obviously it's not a good idea to save all PHP files with the BOM at the beginning, but it doesn't mean you shouldn't use the BOM in your response.

如果是 PHP 文件,它不是文件,而是作为响应发送的生成输出,因此显然在开始时使用 BOM 保存所有 PHP 文件不是一个好主意,但这并不意味着您不应该使用 BOM在你的回应中。

You can in fact safely inject the following code right before your doctype declaration (in case you are generating HTML as response):

实际上,您可以在 doctype 声明之前安全地注入以下代码(以防您生成 HTML 作为响应):

<?="\xEF\xBB\xBF"?>

<?="\xEF\xBB\xBF"?>

For further read: https://www.w3.org/International/questions/qa-byte-order-mark#transcoding

进一步阅读:https: //www.w3.org/International/questions/qa-byte-order-mark#transcoding