PHP 反序列化因非编码字符而失败?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2853454/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP unserialize fails with non-encoded characters?
提问by FFish
$ser = 'a:2:{i:0;s:5:"héll?";i:1;s:5:"w?rld";}'; // fails
$ser2 = 'a:2:{i:0;s:5:"hello";i:1;s:5:"world";}'; // works
$out = unserialize($ser);
$out2 = unserialize($ser2);
print_r($out);
print_r($out2);
echo "<hr>";
But why?
Should I encode before serialzing than? How?
但为什么?
我应该在序列化之前编码吗?如何?
I am using Javascript to write the serialized string to a hidden field, than PHP's $_POST
In JS I have something like:
我使用 Javascript 将序列化字符串写入隐藏字段,而不是 PHP 的 $_POST
在 JS 中,我有类似的东西:
function writeImgData() {
var caption_arr = new Array();
$('.album img').each(function(index) {
caption_arr.push($(this).attr('alt'));
});
$("#hidden-field").attr("value", serializeArray(caption_arr));
};
回答by Alix Axel
The reason why unserialize()fails with:
unserialize()失败的原因是:
$ser = 'a:2:{i:0;s:5:"héll?";i:1;s:5:"w?rld";}';
Is because the length for héll?and w?rldare wrong, since PHP doesn't correctly handle multi-byte strings natively:
是因为长度héll?和w?rld是错误的,因为PHP不能正确处理多字节字符串本身:
echo strlen('héll?'); // 7
echo strlen('w?rld'); // 6
However if you try to unserialize()the following correct string:
但是,如果您尝试unserialize()使用以下正确的字符串:
$ser = 'a:2:{i:0;s:7:"héll?";i:1;s:6:"w?rld";}';
echo '<pre>';
print_r(unserialize($ser));
echo '</pre>';
It works:
有用:
Array
(
[0] => héll?
[1] => w?rld
)
If you use PHP serialize()it should correctly compute the lengths of multi-byte string indexes.
如果您使用 PHP,serialize()它应该正确计算多字节字符串索引的长度。
On the other hand, if you want to work with serialized data in multiple (programming) languages you should forget it and move to something like JSON, which is way more standardized.
另一方面,如果你想用多种(编程)语言处理序列化数据,你应该忘记它并转向像 JSON 这样更标准化的东西。
回答by Lionel Chan
I know this was posted like one year ago, but I just have this issue and come across this, and in fact I found a solution for it. This piece of code works like charm!
我知道这是一年前发布的,但我只是遇到了这个问题并遇到了这个问题,实际上我找到了解决方案。这段代码就像魅力一样!
The idea behind is easy. It's just helping you by recalculating the length of the multibyte strings as posted by @Alix above.
背后的想法很简单。它只是通过重新计算上面@Alix 发布的多字节字符串的长度来帮助您。
A few modifications should suits your code:
一些修改应该适合您的代码:
/**
* Mulit-byte Unserialize
*
* UTF-8 will screw up a serialized string
*
* @access private
* @param string
* @return string
*/
function mb_unserialize($string) {
$string = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('').':\"\";'", $string);
return unserialize($string);
}
Source: http://snippets.dzone.com/posts/show/6592
来源:http: //snippets.dzone.com/posts/show/6592
Tested on my machine, and it works like charm!!
在我的机器上测试过,效果很好!!
回答by David
Lionel Chananswer modified to work with PHP >= 5.5 :
Lionel Chan 的答案已修改为适用于 PHP >= 5.5 :
function mb_unserialize($string) {
$string2 = preg_replace_callback(
'!s:(\d+):"(.*?)";!s',
function($m){
$len = strlen($m[2]);
$result = "s:$len:\"{$m[2]}\";";
return $result;
},
$string);
return unserialize($string2);
}
This code uses preg_replace_callbackas preg_replacewith the /e modifier is obsoletesince PHP 5.5.
此代码使用preg_replace_callback作为preg_replace和 /e 修饰符自 PHP 5.5 起已过时。
回答by lafka
The issue is - as pointed out by Alix- related to encoding.
这个问题 -正如 Alix 所指出的- 与编码有关。
Until PHP 5.4 the internal encoding for PHP was ISO-8859-1, this encoding uses a single byte for some characters that in unicode are multibyte. The result is that multibyte values serialized on UTF-8 system will not be readable on ISO-8859-1 systems.
在 PHP 5.4 之前,PHP 的内部编码是 ISO-8859-1,这种编码对某些在 unicode 中是多字节的字符使用单字节。结果是在 UTF-8 系统上序列化的多字节值在 ISO-8859-1 系统上将无法读取。
The avoid problems like this make sure all systems use the same encoding:
避免这样的问题确保所有系统都使用相同的编码:
mb_internal_encoding('utf-8');
$arr = array('foo' => 'bár');
$buf = serialize($arr);
You can use utf8_(encode|decode)to cleanup:
您可以utf8_(encode|decode)用来清理:
// Set system encoding to iso-8859-1
mb_internal_encoding('iso-8859-1');
$arr = unserialize(utf8_encode($serialized));
print_r($arr);
回答by Joe Hong
In reply to @Lionel above, in fact the function mb_unserialize() as you proposed won't work if the serialized string itself contains char sequence ";(quote followed by semicolon).
Use with caution. For example:
回复上面的@Lionel,实际上,如果序列化字符串本身包含字符序列";(引号后跟分号),则您提出的函数 mb_unserialize() 将不起作用 。谨慎使用。例如:
$test = 'test";string';
// $test is now 's:12:"test";string";'
$string = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('').':\"\";'", $test);
print $string;
// output: s:4:"test";string"; (Wrong!!)
JSON is the ways to go, as mentioned by others, IMHO
JSON 是可行的方法,正如其他人所提到的,恕我直言
Note: I post this as new answer as I don't know how to reply directly (new here).
注意:我将此作为新答案发布,因为我不知道如何直接回复(此处为新答案)。
回答by Mike
One more slight variation here which will hopefully help someone ... I was serializing an array then writing it to a database. On retrieving the data the unserialize operation was failing.
这里还有一个细微的变化,希望能对某人有所帮助......我正在序列化一个数组,然后将其写入数据库。在检索数据时,反序列化操作失败。
It turns out that the database longtext field I was writing into was using latin1 not UTF8. When I switched it round everything worked as planned.
事实证明,我正在写入的数据库长文本字段使用的是 latin1 而不是 UTF8。当我切换它时,一切都按计划进行。
Thanks to all above who mentioned character encoding and got me on the right track!
感谢上面提到字符编码并使我走上正轨的所有人!
回答by ThiefMaster
Do notuse PHP serialization/unserialization when the other end is not PHP. It is not meant to be a portable format - for example, it even includes ascii-1 characters for protected keys which is nothing you want to deal with in javascript (even though it would work perfectly fine, it's just extremely ugly).
千万不能当对方到底是不是PHP使用PHP序列化/反序列化。它并不意味着是一种可移植的格式 - 例如,它甚至包含用于受保护密钥的 ascii-1 字符,而您不想在 javascript 中处理这些字符(尽管它可以正常工作,但它非常难看)。
Instead, use a portable format like JSON. XML would do the job, too, but JSON has less overhead and is more programmer-friendly as you can easily parse it into a simple data structure instead of having to deal with XPath, DOM trees etc.
相反,使用像JSON这样的可移植格式。XML 也可以完成这项工作,但 JSON 的开销更少,并且对程序员更友好,因为您可以轻松地将其解析为简单的数据结构,而不必处理 XPath、DOM 树等。
回答by Vittorio Zamparella
In my case the problem was with line endings(likely some editor have changed my file from DOS to Unix).
就我而言,问题出在行尾(可能某些编辑器已将我的文件从 DOS 更改为 Unix)。
I put together these apadtive wrappers:
我把这些 apadtive 包装纸放在一起:
function unserialize_fetchError($original, &$unserialized, &$errorMsg) {
$unserialized = @unserialize($original);
$errorMsg = error_get_last()['message'];
return ( $unserialized !== false || $original == 'b:0;' ); // "$original == serialize(false)" is a good serialization even if deserialization actually returns false
}
function unserialize_checkAllLineEndings($original, &$unserialized, &$errorMsg, &$lineEndings) {
if ( unserialize_fetchError($original, $unserialized, $errorMsg) ) {
$lineEndings = 'unchanged';
return true;
} elseif ( unserialize_fetchError(str_replace("\n", "\n\r", $original), $unserialized, $errorMsg) ) {
$lineEndings = '\n to \n\r';
return true;
} elseif ( unserialize_fetchError(str_replace("\n\r", "\n", $original), $unserialized, $errorMsg) ) {
$lineEndings = '\n\r to \n';
return true;
} elseif ( unserialize_fetchError(str_replace("\r\n", "\n", $original), $unserialized, $errorMsg) ) {
$lineEndings = '\r\n to \n';
return true;
} //else
return false;
}
回答by Artefacto
I would advise you to use javascript to encode as json and then use json_decodeto unserialize.
我建议您使用 javascript 编码为 json,然后使用json_decode 反序列化。
回答by Rondip
we can break the string down to an array:
我们可以将字符串分解为数组:
$finalArray = array();
$nodeArr = explode('&', $_POST['formData']);
foreach($nodeArr as $value){
$childArr = explode('=', $value);
$finalArray[$childArr[0]] = $childArr[1];
}

