php PHP用unicode字符解码和编码json

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7381900/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-26 02:35:15  来源:igfitidea点击:

PHP decoding and encoding json with unicode characters

phpjsonunicodecharacter-encoding

提问by Keyo

I have some json I need to decode, alter and then encode without messing up any characters.

我有一些 json 我需要解码、更改然后编码,而不会弄乱任何字符。

If I have a unicode character in a json string it will not decode. I'm not sure why since json.org says a string can contain: any-Unicode-character- except-"-or-\-or- control-character. But it doesn't work in python either.

如果我在 json 字符串中有一个 unicode 字符,它将不会解码。我不知道为什么,因为 json.org 说一个字符串可以包含:any-Unicode-character- except-"-or-\-or- control-character。但它在 python 中也不起作用。

{"Tag":"Odómetro"}

I can use utf8_encode which will allow the string to be decoded with json_decode, however the character gets mangled into something else. This is the result from a print_r of the result array. Two characters.

我可以使用 utf8_encode 这将允许使用 json_decode 对字符串进行解码,但是字符会被破坏成其他东西。这是结果数组的 print_r 的结果。两个字符。

[Tag] => Od?3metro

When I encode the array again I the character escaped to ascii, which is correct according to the json spec:

当我再次对数组进行编码时,字符转义为 ascii,根据 json 规范这是正确的:

"Tag"=>"Od\u00f3metro"

Is there some way I can un-escape this? json_encode gives no such option, utf8_encode does not seem to work either.

有什么办法可以避免这种情况吗?json_encode 没有提供这样的选项,utf8_encode 似乎也不起作用。

EditI see there is an unescaped_unicode option for json_encode. However it's not working as expected. Oh damn, it's only on php 5.4. I will have to use some regex as I only have 5.3.

编辑我看到 json_encode 有一个 unescaped_unicode 选项。但是,它没有按预期工作。哦该死,它只在 php 5.4 上。我将不得不使用一些正则表达式,因为我只有 5.3。

$json = json_encode($array, JSON_UNESCAPED_UNICODE);
Warning: json_encode() expects parameter 2 to be long, string ...

采纳答案by John Flatness

Judging from everything you've said, it seems like the original Odómetrostring you're dealing with is encoded with ISO 8859-1, not UTF-8.

从你所说的一切来看,Odómetro你正在处理的原始字符串似乎是用 ISO 8859-1 编码的,而不是 UTF-8。

Here's why I think so:

这就是我这么认为的原因:

  • json_encodeproduced parseable output after you ran the input string through utf8_encode, which converts from ISO 8859-1 to UTF-8.
  • You did say that you got "mangled" output when using print_rafter doing utf8_encode, but the mangled output you got is actually exactly what would happen by trying to parse UTF-8 text as ISO 8859-1 (ó is \x63\xb3in UTF-8, but that sequence is ?3in ISO 8859-1.
  • Your htmlentitieshackaround solution worked. htmlentitiesneeds to know what the encoding of the input string to work correctly. If you don't specify one, it assumes ISO 8859-1. (html_entity_decode, confusingly, defaults to UTF-8, so your method had the effect of converting from ISO 8859-1 to UTF-8.)
  • You said you had the same problem in Python, which would seem to exclude PHP from being the issue.
  • json_encode在通过 运行输入字符串后生成可解析的输出utf8_encode,它将从 ISO 8859-1 转换为 UTF-8。
  • 您确实说过print_r在执行后使用时得到了“错位”输出utf8_encode,但是您得到的错位输出实际上正是尝试将 UTF-8 文本解析为 ISO 8859-1 时会发生的情况(ó\x63\xb3在 UTF-8 中,但是序列?3在 ISO 8859-1 中。
  • 您的htmlentitieshackaround 解决方案有效。htmlentities需要知道输入字符串的编码才能正常工作。如果您不指定,则假定为 ISO 8859-1。(html_entity_decode令人困惑的是,默认为 UTF-8,因此您的方法具有从 ISO 8859-1 转换为 UTF-8 的效果。)
  • 你说你在 Python 中遇到了同样的问题,这似乎排除了 PHP 的问题。

PHP will use the \uXXXXescaping, but as you noted, this is valid JSON.

PHP 将使用\uXXXX转义,但正如您所指出的,这是有效的 JSON。

So, it seems like you need to configure your connection to Postgres so that it will give you UTF-8 strings. The PHP manual indicates you'd do this by appending options='--client_encoding=UTF8'to the connection string. There's also the possibility that the data currently stored in the database is in the wrong encoding. (You could simply use utf8_encode, but this will only support characters that are part of ISO 8859-1).

因此,您似乎需要配置与 Postgres 的连接,以便它为您提供 UTF-8 字符串。PHP 手册表明您可以通过附加options='--client_encoding=UTF8'到连接字符串来执行此操作。当前存储在数据库中的数据也有可能采用错误的编码。(您可以简单地使用utf8_encode,但这将仅支持属于 ISO 8859-1 的字符)。

Finally, as another answer noted, you do need to make sure that you're declaring the proper charset, with an HTTP header or otherwise (of course, this particular issue might have just been an artifact of the environment where you did your print_rtesting).

最后,正如另一个答案所指出的,您确实需要确保使用 HTTP 标头或其他方式声明正确的字符集(当然,这个特定问题可能只是您print_r进行测试的环境的产物) .

回答by Sunny S.M

I have found following way to fix this issue... I hope this can help you.

我找到了以下方法来解决这个问题......我希望这可以帮助你。

json_encode($data,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);

回答by Treffynnon

JSON_UNESCAPED_UNICODEwas added in PHP 5.4 so it looks like you need upgrade your version of PHP to take advantage of it. 5.4 is not released yet though! :(

JSON_UNESCAPED_UNICODE已在 PHP 5.4 中添加,因此您似乎需要升级 PHP 版本才能利用它。不过 5.4 还没有发布!:(

There is a 5.4 alpha release candidateon QA though if you want to play on your development machine.

如果你想在你的开发机器上玩,QA 上有一个5.4 alpha 候选版本

回答by Keyo

A hacky way of doing JSON_UNESCAPED_UNICODE in PHP 5.3. Really disappointed by PHP json support. Maybe this will help someone else.

在 PHP 5.3 中执行 JSON_UNESCAPED_UNICODE 的一种hacky 方式。对 PHP json 支持真的很失望。也许这会帮助别人。

$array = some_json();
// Encode all string children in the array to html entities.
array_walk_recursive($array, function(&$item, $key) {
    if(is_string($item)) {
        $item = htmlentities($item);
    }
});
$json = json_encode($array);

// Decode the html entities and end up with unicode again.
$json = html_entity_decode($rson);

回答by The Mask

try setting the utf-8encoding in your page:

尝试utf-8在您的页面中设置编码:

header('content-type:text/html;charset=utf-8');

this works for me:

这对我有用:

$arr = array('tag' => 'Odómetro');
$encoded = json_encode($arr);
$decoded = json_decode($encoded);
echo $decoded->{'tag'};

回答by Fernando R.

$json = array('tag' => 'Odómetro'); // Original array
$json = json_encode($json); // {"Tag":"Od\u00f3metro"}
$json = json_decode($json); // Od\u00f3metro becomes  Od?3metro
echo $json->{'tag'}; // Od?3metro
echo utf8_decode($json->{'tag'}); // Odómetro

You were close, just use utf8_decode.

你很接近,只需使用utf8_decode。

回答by Jonathan Edgardo

Try Using:

尝试使用:

utf8_decode() and utf8_encode

回答by Navaneeth Mohan

To encode an array that contains special characters, ISO 8859-1 to UTF8. (If utf8_encode & utf8_decode is not what is working for you, this might be an option)

要将包含特殊字符的数组编码为 ISO 8859-1 到 UTF8。(如果 utf8_encode 和 utf8_decode 不适合您,这可能是一个选项)

Everything that is in ISO-8859-1 should be converted to UTF8:

ISO-8859-1 中的所有内容都应转换为 UTF8:

$utf8 = utf8_encode('? ??? ??? ????!'); //contains UTF8 & ISO 8859-1 characters;    
$iso88591 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');
$data = $iso88591;

Encode should work after this:

编码应该在此之后工作:

$encoded_data = json_encode($data);

Convert UTF-8 to & from ISO 8859-1

将 UTF-8 与 ISO 8859-1 相互转换