php 如何删除多个 UTF-8 BOM 序列

Question

提问by sheppardzw

Using PHP5 (cgi) to output template files from the filesystem and having issues spitting out raw HTML.

使用 PHP5 (cgi) 从文件系统输出模板文件并在输出原始 HTML 时遇到问题。

private function fetch($name) {
    $path = $this->j->config['template_path'] . $name . '.html';
    if (!file_exists($path)) {
        dbgerror('Could not find the template "' . $name . '" in ' . $path);
    }
    $f = fopen($path, 'r');
    $t = fread($f, filesize($path));
    fclose($f);
    if (substr($t, 0, 3) == b'\xef\xbb\xbf') {
        $t = substr($t, 3);
    }
    return $t;
}

Even though I've added the BOM fix I'm still having problems with Firefox accepting it. You can see a live copy here: http://ircb.in/jisti/(and the template file I threw at http://ircb.in/jisti/home.htmlif you want to check it out)

即使我添加了 BOM 修复，我仍然遇到 Firefox 接受它的问题。您可以在此处查看实时副本：http: //ircb.in/jisti/（如果您想查看，还有我放在http://ircb.in/jisti/home.html 上的模板文件）

Any idea how to fix this? o_o

知道如何解决这个问题吗？o_o

Answer 1

回答by jasonhao

you would use the following code to remove utf8 bom

您将使用以下代码删除 utf8 bom

//Remove UTF8 Bom

function remove_utf8_bom($text)
{
    $bom = pack('H*','EFBBBF');
    $text = preg_replace("/^$bom/", '', $text);
    return $text;
}

Answer 2

回答by o1max

try:

尝试：

// -------- read the file-content ----
$str = file_get_contents($source_file); 

// -------- remove the utf-8 BOM ----
$str = str_replace("\xEF\xBB\xBF",'',$str); 

// -------- get the Object from JSON ---- 
$obj = json_decode($str);

:)

Answer 3

回答by Dean Or

Another way to remove the BOM which is Unicode code point U+FEFF

删除BOM的另一种方法是Unicode代码点U + FEFF

$str = preg_replace('/\x{FEFF}/u', '', $file);

Answer 4

回答by deceze

b'\xef\xbb\xbf'stands for the literal string "\xef\xbb\xbf". If you want to check for a BOM, you need to use double quotes, so the \xsequences are actually interpreted into bytes:

b'\xef\xbb\xbf'代表文字串“\xef\xbb\xbf”。如果要检查 BOM，则需要使用双引号，因此\x序列实际上被解释为字节：

"\xef\xbb\xbf"

Your files also seem to contain a lot more garbage than just a single leading BOM:

您的文件似乎也包含比单个前导 BOM 多得多的垃圾：

$ curl http://ircb.in/jisti/ | xxd

0000000: efbb bfef bbbf efbb bfef bbbf efbb bfef  ................
0000010: bbbf efbb bf3c 2144 4f43 5459 5045 2068  .....<!DOCTYPE h
0000020: 746d 6c3e 0a3c 6874 6d6c 3e0a 3c68 6561  tml>.<html>.<hea
...

Answer 5

回答by phvish

if anybody using csv import then below code useful

如果有人使用 csv import 那么下面的代码很有用

$header = fgetcsv($handle);
foreach($header as $key=> $val) {
     $bom = pack('H*','EFBBBF');
     $val = preg_replace("/^$bom/", '', $val);
     $header[$key] = $val;
}

Answer 6

回答by Patrick Otto

This global funtion resolve for UTF-8 system base charset. Tanks!

此全局函数解析 UTF-8 系统基本字符集。坦克！

function prepareCharset($str) {

    // set default encode
    mb_internal_encoding('UTF-8');

    // pre filter
    if (empty($str)) {
        return $str;
    }

    // get charset
    $charset = mb_detect_encoding($str, array('ISO-8859-1', 'UTF-8', 'ASCII'));

    if (stristr($charset, 'utf') || stristr($charset, 'iso')) {
        $str = iconv('ISO-8859-1', 'UTF-8//TRANSLIT', utf8_decode($str));
    } else {
        $str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');
    }

    // remove BOM
    $str = urldecode(str_replace("%C2%81", '', urlencode($str)));

    // prepare string
    return $str;
}

Answer 7

回答by trank

A solution without packfunction:

一个没有pack功能的解决方案：

$a = "1";
var_dump($a); // string(4) "1"

function deleteBom($text)
{
    return preg_replace("/^\xEF\xBB\xBF/", '', $text);
}

var_dump(deleteBom($a)); // string(1) "1"

Answer 8

回答by Alfred Huang

An extra method to do the same job:

做同样工作的额外方法：

function remove_utf8_bom_head($text) {
    if(substr(bin2hex($text), 0, 6) === 'efbbbf') {
        $text = substr($text, 3);
    }
    return $text;
}

The other methods I found cannot work in my case.

我发现的其他方法在我的情况下不起作用。

Hope it helps in some special case.

希望它在某些特殊情况下有所帮助。

Answer 9

回答by Paulo Scardine

If you are reading some API using file_get_contentsand got an inexplicable NULLfrom json_decode, check the value of json_last_error(): sometimes the value returned from file_get_contentswill have an extraneous BOM that is almost invisible when you inspect the string, but will make json_last_error()to return JSON_ERROR_SYNTAX(4).

如果您正在使用阅读某些 APIfile_get_contents并得到一个莫名其妙的NULLfrom json_decode，请检查的值json_last_error()：有时从返回的值file_get_contents会有一个无关的 BOM，当您检查字符串时几乎不可见，但会json_last_error()返回 JSON_ERROR_SYNTAX(4)。

>>> $json = file_get_contents("http://api-guiaserv.seade.gov.br/v1/orgao/all");
=> "?\t{"orgao":[{"Nome":"Tribunal de Justi\u00e7a","ID_Orgao":"59","Condicao":"1"}, ...]}"
>>> json_decode($json);
=> null
>>>

In this case, check the first 3 bytes - echoing them is not very useful because the BOM is invisible on most settings:

在这种情况下，检查前 3 个字节 - 回显它们不是很有用，因为 BOM 在大多数设置中是不可见的：

>>> substr($json, 0, 3)
=> "  ?"
>>> substr($json, 0, 3) == pack('H*','EFBBBF');
=> true
>>>

If the line above returns TRUE for you, then a simple test may fix the problem:

如果上面的行为您返回 TRUE，那么一个简单的测试可能会解决问题：

>>> json_decode($json[0] == "{" ? $json : substr($json, 3))
=> {#204
     +"orgao": [
       {#203
         +"Nome": "Tribunal de Justi?a",
         +"ID_Orgao": "59",
         +"Condicao": "1",
       },
     ],
     ...
   }

Answer 10

回答by Juergen

When working with faulty software it happens that the BOM part gets multiplied with every saving.

当使用有缺陷的软件时，BOM 部分会随着每次保存而增加。

So I am using this to get rid of it.

所以我用它来摆脱它。

function remove_utf8_bom($text) {
    $bom = pack('H*','EFBBBF');
    while (preg_match("/^$bom/", $text)) {
        $text = preg_replace("/^$bom/", '', $text);
    }
    return $text;
}

php 如何删除多个 UTF-8 BOM 序列

提问by sheppardzw

回答by jasonhao

回答by o1max

回答by Dean Or

回答by deceze

回答by phvish

回答by Patrick Otto

回答by trank

回答by Alfred Huang

回答by Paulo Scardine

回答by Juergen

相关推荐

最近更新

标签

php 如何删除多个 UTF-8 BOM 序列

提问by sheppardzw

回答by jasonhao

回答by o1max

回答by Dean Or

回答by deceze

回答by phvish

回答by Patrick Otto

回答by trank

回答by Alfred Huang

回答by Paulo Scardine

回答by Juergen

相关推荐

php 从每个子数组中获取特定元素

在 PHP 中验证信用卡的最佳方法是什么？

php PHP中的语音识别？

使用全局变量作为数据源的 PHP 会话副作用警告

相关推荐

最近更新

标签