如何删除 PHP 字符串中的 %EF%BB%BF

Question

提问by bbnn

I am trying to use the Microsoft BingAPI.

我正在尝试使用 Microsoft BingAPI。

$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav");
$data = stripslashes(trim($data));

The data returned has a ' ' character in the first character of the returned string. It is not a space, because I trimed it before returning the data.

返回的数据在返回字符串的第一个字符中有一个 ' ' 字符。它不是空格，因为我在返回数据之前对其进行了修剪。

The ' ' character turned out to be %EF%BB%BF.

' ' 字符原来是 %EF%BB%BF。

I wonder why this happened, maybe a bug from Microsoft?

我想知道为什么会发生这种情况，也许是微软的错误？

How can I remove this %EF%BB%BF in PHP?

如何在 PHP 中删除这个 %EF%BB%BF？

Answer 1

采纳答案by Gumbo

You could use substrto only get the rest without the UTF-8 BOM:

您可以使用substr仅获取其余部分而无需UTF-8 BOM：

// if it's binary UTF-8
$data = substr($data, 3);
// if it's percent-encoded UTF-8
$data = substr($data, 9);

Answer 2

回答by Lee

You should not simply discard the BOM unless you're 100% sure that the stream will: (a) always be UTF-8, and (b) always have a UTF-8 BOM.

您不应简单地丢弃 BOM，除非您 100% 确定流将：(a) 始终为 UTF-8，并且 (b) 始终具有 UTF-8 BOM。

The reasons:

原因：

In UTF-8, a BOM is optional- so if the service quits sending it at some future point you'll be throwing away the first three characters of your response instead.
The whole purpose of the BOM is to identify unambiguously the type of UTF stream being interpreted UTF-8? -16? or -32?, and also to indicate the 'endian-ness' (byte order) of the encoded information. If you just throw it away you're assuming that you're always getting UTF-8; this may not be a very good assumption.
Not all BOMs are 3-bytes long, only the UTF-8 one is three bytes. UTF-16 is two bytes, and UTF-32 is four bytes. So if the service switches to a wider UTF encoding in the future, your code will break.

在 UTF-8 中，BOM 是可选的——所以如果服务在未来某个时候停止发送它，你将丢弃响应的前三个字符。
BOM 的全部目的是明确识别被解释为 UTF-8 的 UTF 流的类型？-16？或 -32?，还表示编码信息的“字节序”（字节顺序）。如果你只是把它扔掉，你就假设你总是得到 UTF-8；这可能不是一个很好的假设。
并非所有 BOM 都是 3 字节长，只有 UTF-8 是 3 字节。UTF-16 是两个字节，UTF-32 是四个字节。因此，如果该服务将来切换到更广泛的 UTF 编码，您的代码将会中断。

I think a more appropriate way to handle this would be something like:

我认为处理这个问题的更合适的方法是：

/* Detect the encoding, then convert from detected encoding to ASCII */
$enc = mb_detect_encoding($data);
$data = mb_convert_encoding($data, "ASCII", $enc);

Answer 3

回答by D3F4ULT

$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav"); $data = stripslashes(trim($data));

if (substr($data, 0, 3) == "\xef\xbb\xbf") { $data = substr($data, 3); }

Answer 4

回答by Eric Bowman - abstracto -

It's a byte order mark(BOM), indicating the response is encoded as UTF-8. You can safely remove it, but you should parse the remainder as UTF-8.

它是一个字节顺序标记(BOM)，表示响应被编码为UTF-8。您可以安全地删除它，但您应该将其余部分解析为 UTF-8。

Answer 5

回答by a coder

I had the same problem today, and fixed by ensuring the string was set to UTF-8:

我今天遇到了同样的问题，并通过确保将字符串设置为 UTF-8 来修复：

http://php.net/manual/en/function.utf8-encode.php

$content = utf8_encode ( $content );

Answer 6

回答by Amy B

$data = str_replace('%EF%BB%BF', '', $data);

You probably shouldn't be using stripslashes-- unless the API returns blackslashed data (and 99.99% chance it doesn't), take that call out.

您可能不应该使用stripslashes-- 除非 API 返回黑色斜线数据（并且有 99.99% 的机会没有），请取消该调用。

Answer 7

回答by enobrev

To remove it from the beginning of the string (only):

要从字符串的开头删除它（仅）：

$data = preg_replace('/^%EF%BB%BF/', '', $data);

如何删除 PHP 字符串中的 %EF%BB%BF

提问by bbnn

采纳答案by Gumbo

回答by Lee

回答by D3F4ULT

回答by Eric Bowman - abstracto -

回答by a coder

回答by Amy B

回答by enobrev

相关推荐

最近更新

标签

如何删除 PHP 字符串中的 %EF%BB%BF

提问by bbnn

采纳答案by Gumbo

回答by Lee

回答by D3F4ULT

回答by Eric Bowman - abstracto -

回答by a coder

回答by Amy B

回答by enobrev

相关推荐

什么可以用于 PHP 5.2 的 DateTime::diff() ？

php 从 HTML 表单创建 XML 文件

使用 json_decode 在 PHP 中解析 JSON 对象

php 复选框的 CodeIgniter 表单验证规则

相关推荐

最近更新

标签