如何删除 PHP 字符串中的 %EF%BB%BF
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4057742/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove %EF%BB%BF in a PHP string
提问by bbnn
I am trying to use the Microsoft BingAPI.
我正在尝试使用 Microsoft BingAPI。
$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav");
$data = stripslashes(trim($data));
The data returned has a ' ' character in the first character of the returned string. It is not a space, because I trimed it before returning the data.
返回的数据在返回字符串的第一个字符中有一个 ' ' 字符。它不是空格,因为我在返回数据之前对其进行了修剪。
The ' ' character turned out to be %EF%BB%BF.
' ' 字符原来是 %EF%BB%BF。
I wonder why this happened, maybe a bug from Microsoft?
我想知道为什么会发生这种情况,也许是微软的错误?
How can I remove this %EF%BB%BF in PHP?
如何在 PHP 中删除这个 %EF%BB%BF?
采纳答案by Gumbo
回答by Lee
You should not simply discard the BOM unless you're 100% sure that the stream will: (a) always be UTF-8, and (b) always have a UTF-8 BOM.
您不应简单地丢弃 BOM,除非您 100% 确定流将:(a) 始终为 UTF-8,并且 (b) 始终具有 UTF-8 BOM。
The reasons:
原因:
- In UTF-8, a BOM is optional- so if the service quits sending it at some future point you'll be throwing away the first three characters of your response instead.
- The whole purpose of the BOM is to identify unambiguously the type of UTF stream being interpreted UTF-8? -16? or -32?, and also to indicate the 'endian-ness' (byte order) of the encoded information. If you just throw it away you're assuming that you're always getting UTF-8; this may not be a very good assumption.
- Not all BOMs are 3-bytes long, only the UTF-8 one is three bytes. UTF-16 is two bytes, and UTF-32 is four bytes. So if the service switches to a wider UTF encoding in the future, your code will break.
- 在 UTF-8 中,BOM 是可选的——所以如果服务在未来某个时候停止发送它,你将丢弃响应的前三个字符。
- BOM 的全部目的是明确识别被解释为 UTF-8 的 UTF 流的类型?-16?或 -32?,还表示编码信息的“字节序”(字节顺序)。如果你只是把它扔掉,你就假设你总是得到 UTF-8;这可能不是一个很好的假设。
- 并非所有 BOM 都是 3 字节长,只有 UTF-8 是 3 字节。UTF-16 是两个字节,UTF-32 是四个字节。因此,如果该服务将来切换到更广泛的 UTF 编码,您的代码将会中断。
I think a more appropriate way to handle this would be something like:
我认为处理这个问题的更合适的方法是:
/* Detect the encoding, then convert from detected encoding to ASCII */
$enc = mb_detect_encoding($data);
$data = mb_convert_encoding($data, "ASCII", $enc);
回答by D3F4ULT
$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav");
$data = stripslashes(trim($data));
$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav");
$data = stripslashes(trim($data));
if (substr($data, 0, 3) == "\xef\xbb\xbf") {
$data = substr($data, 3);
}
if (substr($data, 0, 3) == "\xef\xbb\xbf") {
$data = substr($data, 3);
}
回答by Eric Bowman - abstracto -
It's a byte order mark(BOM), indicating the response is encoded as UTF-8. You can safely remove it, but you should parse the remainder as UTF-8.
回答by a coder
I had the same problem today, and fixed by ensuring the string was set to UTF-8:
我今天遇到了同样的问题,并通过确保将字符串设置为 UTF-8 来修复:
http://php.net/manual/en/function.utf8-encode.php
http://php.net/manual/en/function.utf8-encode.php
$content = utf8_encode ( $content );
$content = utf8_encode ( $content );
回答by Amy B
$data = str_replace('%EF%BB%BF', '', $data);
$data = str_replace('%EF%BB%BF', '', $data);
You probably shouldn't be using stripslashes
-- unless the API returns blackslashed data (and 99.99% chance it doesn't), take that call out.
您可能不应该使用stripslashes
-- 除非 API 返回黑色斜线数据(并且有 99.99% 的机会没有),请取消该调用。
回答by enobrev
To remove it from the beginning of the string (only):
要从字符串的开头删除它(仅):
$data = preg_replace('/^%EF%BB%BF/', '', $data);