php 如何删除多个 UTF-8 BOM 序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10290849/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove multiple UTF-8 BOM sequences
提问by sheppardzw
Using PHP5 (cgi) to output template files from the filesystem and having issues spitting out raw HTML.
使用 PHP5 (cgi) 从文件系统输出模板文件并在输出原始 HTML 时遇到问题。
private function fetch($name) {
$path = $this->j->config['template_path'] . $name . '.html';
if (!file_exists($path)) {
dbgerror('Could not find the template "' . $name . '" in ' . $path);
}
$f = fopen($path, 'r');
$t = fread($f, filesize($path));
fclose($f);
if (substr($t, 0, 3) == b'\xef\xbb\xbf') {
$t = substr($t, 3);
}
return $t;
}
Even though I've added the BOM fix I'm still having problems with Firefox accepting it. You can see a live copy here: http://ircb.in/jisti/(and the template file I threw at http://ircb.in/jisti/home.htmlif you want to check it out)
即使我添加了 BOM 修复,我仍然遇到 Firefox 接受它的问题。您可以在此处查看实时副本:http: //ircb.in/jisti/(如果您想查看,还有我放在http://ircb.in/jisti/home.html 上的模板文件)
Any idea how to fix this? o_o
知道如何解决这个问题吗?o_o
回答by jasonhao
you would use the following code to remove utf8 bom
您将使用以下代码删除 utf8 bom
//Remove UTF8 Bom
function remove_utf8_bom($text)
{
$bom = pack('H*','EFBBBF');
$text = preg_replace("/^$bom/", '', $text);
return $text;
}
回答by o1max
try:
尝试:
// -------- read the file-content ----
$str = file_get_contents($source_file);
// -------- remove the utf-8 BOM ----
$str = str_replace("\xEF\xBB\xBF",'',$str);
// -------- get the Object from JSON ----
$obj = json_decode($str);
:)
:)
回答by Dean Or
Another way to remove the BOM which is Unicode code point U+FEFF
删除BOM的另一种方法是Unicode代码点U + FEFF
$str = preg_replace('/\x{FEFF}/u', '', $file);
回答by deceze
b'\xef\xbb\xbf'stands for the literal string "\xef\xbb\xbf". If you want to check for a BOM, you need to use double quotes, so the \xsequences are actually interpreted into bytes:
b'\xef\xbb\xbf'代表文字串“\xef\xbb\xbf”。如果要检查 BOM,则需要使用双引号,因此\x序列实际上被解释为字节:
"\xef\xbb\xbf"
Your files also seem to contain a lot more garbage than just a single leading BOM:
您的文件似乎也包含比单个前导 BOM 多得多的垃圾:
$ curl http://ircb.in/jisti/ | xxd
0000000: efbb bfef bbbf efbb bfef bbbf efbb bfef ................
0000010: bbbf efbb bf3c 2144 4f43 5459 5045 2068 .....<!DOCTYPE h
0000020: 746d 6c3e 0a3c 6874 6d6c 3e0a 3c68 6561 tml>.<html>.<hea
...
回答by phvish
if anybody using csv import then below code useful
如果有人使用 csv import 那么下面的代码很有用
$header = fgetcsv($handle);
foreach($header as $key=> $val) {
$bom = pack('H*','EFBBBF');
$val = preg_replace("/^$bom/", '', $val);
$header[$key] = $val;
}
回答by Patrick Otto
This global funtion resolve for UTF-8 system base charset. Tanks!
此全局函数解析 UTF-8 系统基本字符集。坦克!
function prepareCharset($str) {
// set default encode
mb_internal_encoding('UTF-8');
// pre filter
if (empty($str)) {
return $str;
}
// get charset
$charset = mb_detect_encoding($str, array('ISO-8859-1', 'UTF-8', 'ASCII'));
if (stristr($charset, 'utf') || stristr($charset, 'iso')) {
$str = iconv('ISO-8859-1', 'UTF-8//TRANSLIT', utf8_decode($str));
} else {
$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');
}
// remove BOM
$str = urldecode(str_replace("%C2%81", '', urlencode($str)));
// prepare string
return $str;
}
回答by trank
A solution without packfunction:
一个没有pack功能的解决方案:
$a = "1";
var_dump($a); // string(4) "1"
function deleteBom($text)
{
return preg_replace("/^\xEF\xBB\xBF/", '', $text);
}
var_dump(deleteBom($a)); // string(1) "1"
回答by Alfred Huang
An extra method to do the same job:
做同样工作的额外方法:
function remove_utf8_bom_head($text) {
if(substr(bin2hex($text), 0, 6) === 'efbbbf') {
$text = substr($text, 3);
}
return $text;
}
The other methods I found cannot work in my case.
我发现的其他方法在我的情况下不起作用。
Hope it helps in some special case.
希望它在某些特殊情况下有所帮助。
回答by Paulo Scardine
If you are reading some API using file_get_contentsand got an inexplicable NULLfrom json_decode, check the value of json_last_error(): sometimes the value returned from file_get_contentswill have an extraneous BOM that is almost invisible when you inspect the string, but will make json_last_error()to return JSON_ERROR_SYNTAX(4).
如果您正在使用 阅读某些 APIfile_get_contents并得到一个莫名其妙的NULLfrom json_decode,请检查 的值json_last_error():有时从 返回的值file_get_contents会有一个无关的 BOM,当您检查字符串时几乎不可见,但会json_last_error()返回 JSON_ERROR_SYNTAX(4)。
>>> $json = file_get_contents("http://api-guiaserv.seade.gov.br/v1/orgao/all");
=> "?\t{"orgao":[{"Nome":"Tribunal de Justi\u00e7a","ID_Orgao":"59","Condicao":"1"}, ...]}"
>>> json_decode($json);
=> null
>>>
In this case, check the first 3 bytes - echoing them is not very useful because the BOM is invisible on most settings:
在这种情况下,检查前 3 个字节 - 回显它们不是很有用,因为 BOM 在大多数设置中是不可见的:
>>> substr($json, 0, 3)
=> " ?"
>>> substr($json, 0, 3) == pack('H*','EFBBBF');
=> true
>>>
If the line above returns TRUE for you, then a simple test may fix the problem:
如果上面的行为您返回 TRUE,那么一个简单的测试可能会解决问题:
>>> json_decode($json[0] == "{" ? $json : substr($json, 3))
=> {#204
+"orgao": [
{#203
+"Nome": "Tribunal de Justi?a",
+"ID_Orgao": "59",
+"Condicao": "1",
},
],
...
}
回答by Juergen
When working with faulty software it happens that the BOM part gets multiplied with every saving.
当使用有缺陷的软件时,BOM 部分会随着每次保存而增加。
So I am using this to get rid of it.
所以我用它来摆脱它。
function remove_utf8_bom($text) {
$bom = pack('H*','EFBBBF');
while (preg_match("/^$bom/", $text)) {
$text = preg_replace("/^$bom/", '', $text);
}
return $text;
}

