php json_encode() 非 utf-8 字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6606713/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
json_encode() non utf-8 strings?
提问by Josh
So I have an array of strings, and all of the strings are using the system default ANSIencoding and were pulled from a SQL database. So there are 256 different possible character byte values (single byte encoding).
Is there a way I can get json_encode()
to work and display these characters instead of having to use utf8_encode()
on all of my strings and ending up with stuff like \u0082
?
所以我有一个字符串数组,所有字符串都使用系统默认的ANSI编码并从 SQL 数据库中提取。所以有 256 个不同的可能字符字节值(单字节编码)。
有没有一种方法可以让我开始json_encode()
工作并显示这些字符,而不必utf8_encode()
在我的所有字符串上使用并以类似的东西结束\u0082
?
Or is that the standard for JSON?
或者那是 JSON 的标准?
回答by hakre
Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like "\u0082"?
有没有办法让 json_encode() 工作并显示这些字符,而不必在我的所有字符串上使用 utf8_encode() 并以“\u0082”之类的东西结束?
If you have an ANSI encoded string, using utf8_encode()
is the wrongfunction to deal with this. You need to properly convert it from ANSI to UTF-8 first. That will certainly reduce the number of Unicode escape sequences like \u0082
from the json output, but technically these sequences are valid for json, you must not fear them.
如果您有 ANSI 编码的字符串,则使用处理此问题的函数utf8_encode()
是错误的。您需要先将其从 ANSI 正确转换为 UTF-8。这肯定会减少\u0082
json 输出中的 Unicode 转义序列的数量,但从技术上讲,这些序列对 json 有效,您不必害怕它们。
Converting ANSI to UTF-8 with PHP
使用 PHP 将 ANSI 转换为 UTF-8
json_encode
works with UTF-8
encoded strings only. If you need to create valid json
successfully from an ANSI
encoded string, you need to re-encode/convert it to UTF-8
first. Then json_encode
will just work as documented.
json_encode
仅适用于UTF-8
编码字符串。如果您需要从编码字符串成功创建有效,则需要先将其重新编码/转换为。然后将按照文档工作。json
ANSI
UTF-8
json_encode
To convert an encoding from ANSI
(more correctly I assume you have a Windows-1252
encoded string, which is popular but wrongly referred to as ANSI
) to UTF-8
you can make use of the mb_convert_encoding()
function:
要将编码从ANSI
(更准确地说,我假设您有一个Windows-1252
编码字符串,它很流行但被错误地称为ANSI
),UTF-8
您可以使用该mb_convert_encoding()
函数:
$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");
Another function in PHP that can convert the encoding / charset of a string is called iconv
based on libiconv. You can use it as well:
PHP 中另一个可以转换字符串编码/字符集的函数是iconv
基于libiconv调用的。您也可以使用它:
$str = iconv("CP1252", "UTF-8", $str);
Note on utf8_encode()
关于 utf8_encode() 的注意事项
utf8_encode()
does only work for Latin-1
, not for ANSI
. So you will destroy part of your characters inside that string when you run it through that function.
utf8_encode()
仅适用于Latin-1
,不适用于ANSI
。因此,当您通过该函数运行该字符串时,您将销毁该字符串中的部分字符。
Related: What is ANSI format?
相关:什么是 ANSI 格式?
For a more fine-grained control of what json_encode()
returns, see the list of predifined constants(PHP version dependent, incl. PHP 5.4, some constants remain undocumented and are available in the source code only so far).
要对json_encode()
返回的内容进行更细粒度的控制,请参阅预定义常量列表(取决于 PHP 版本,包括 PHP 5.4,一些常量仍未记录在案,目前仅在源代码中可用)。
Changing the encoding of an array/iteratively (PDO comment)
以迭代方式更改数组的编码(PDO 注释)
As you wrote in a comment that you have problems to apply the function onto an array, here is some code example. It's alwaysneeded to firstchange the encoding before using json_encode
. That's just a standard array operation, for the simpler case of pdo::fetch()
a foreach
iteration:
正如您在评论中写道,将函数应用于数组时遇到问题,这里有一些代码示例。它总是需要先更改编码使用前json_encode
。对于更简单pdo::fetch()
的foreach
迭代情况,这只是一个标准的数组操作:
while($row = $q->fetch(PDO::FETCH_ASSOC))
{
foreach($row as &$value)
{
$value = mb_convert_encoding($value, "UTF-8", "Windows-1252");
}
unset($value); # safety: remove reference
$items[] = array_map('utf8_encode', $row );
}
回答by Andrew Moore
The JSON standard ENFORCES Unicode encoding. From RFC4627:
JSON 标准强制使用 Unicode 编码。来自RFC4627:
3. Encoding
JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
Therefore, on the strictest sense, ANSI encoded JSON wouldn't be valid JSON; this is why PHP enforces unicode encoding when using json_encode()
.
因此,严格来说,ANSI 编码的 JSON 不是有效的 JSON;这就是 PHP 在使用json_encode()
.
As for "default ANSI", I'm pretty sure that your strings are encoded in Windows-1252. It is incorrectly referred to as ANSI.
至于“默认 ANSI”,我很确定您的字符串是在 Windows-1252 中编码的。它被错误地称为 ANSI。
回答by Jenyok
<?php
$array = array('first word' => array('Слово','Кириллица'),'second word' => 'Кириллица','last word' => 'Кириллица');
echo json_encode($array);
/*
return {"first word":["\u0421\u043b\u043e\u0432\u043e","\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"],"second word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430","last word":"\u041a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"}
*/
echo json_encode($array,256);
/*
return {"first word":["Слово","Кириллица"],"second word":"Кириллица","last word":"Кириллица"}
*/
?>
JSON_UNESCAPED_UNICODE (integer) Encode multibyte Unicode characters literally (default is to escape as \uXXXX). Available since PHP 5.4.0.
JSON_UNESCAPED_UNICODE (integer) 按字面编码多字节 Unicode 字符(默认转义为 \uXXXX)。自 PHP 5.4.0 起可用。
http://php.net/manual/en/json.constants.php#constant.json-unescaped-unicode
http://php.net/manual/en/json.constants.php#constant.json-unescaped-unicode
回答by caiofior
I found the following answer for an analogous problem with a nested array not utf-8 encoded that i had to json encode:
我找到了以下类似问题的答案,其中嵌套数组不是 utf-8 编码的,我必须进行 json 编码:
$inputArray = array(
'a'=>'First item - à',
'c'=>'Third item - é'
);
$inputArray['b']= array (
'a'=>'First subitem - ù',
'b'=>'Second subitem - ì'
);
if (!function_exists('recursive_utf8')) {
function recursive_utf8 ($data) {
if (!is_array($data)) {
return utf8_encode($data);
}
$result = array();
foreach ($data as $index=>$item) {
if (is_array($item)) {
$result[$index] = array();
foreach($item as $key=>$value) {
$result[$index][$key] = recursive_utf8($value);
}
}
else if (is_object($item)) {
$result[$index] = array();
foreach(get_object_vars($item) as $key=>$value) {
$result[$index][$key] = recursive_utf8($value);
}
}
else {
$result[$index] = recursive_utf8($item);
}
}
return $result;
}
}
$outputArray = json_encode(array_map('recursive_utf8', $inputArray ));
回答by user3750780
json_encode($str,JSON_HEX_TAG|JSON_HEX_AMP|JSON_HEX_APOS|JSON_HEX_QUOT);
that will convert windows based ANSI to utf-8 and the error will be no more.
这会将基于 Windows 的 ANSI 转换为 utf-8,错误将不再存在。
回答by Raptor
Use this instead:
改用这个:
<?php
//$return_arr = the array of data to json encode
//$out = the output of the function
//don't forget to escape the data before use it!
$out = '["' . implode('","', $return_arr) . '"]';
?>
Copy from json_encode php manual's comments. Always read the comments. They are useful.
从json_encode php manual的注释中复制。总是阅读评论。它们很有用。