php 如何防止 json_encode() 删除包含无效字符的字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4663743/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 13:53:54  来源:igfitidea点击:

How to keep json_encode() from dropping strings with invalid characters

phputf-8json

提问by Pekka

Is there a way to keep json_encode()from returning nullfor a string that contains an invalid (non-UTF-8) character?

有没有办法json_encode()避免返回null包含无效(非 UTF-8)字符的字符串?

It can be a pain in the ass to debug in a complex system. It would be much more fitting to actually see the invalid character, or at least have it omitted. As it stands, json_encode()will silently drop the entire string.

在复杂的系统中调试可能会很麻烦。实际看到无效字符或至少将其省略会更合适。就目前而言,json_encode()将默默地放下整个字符串。

Example (in UTF-8):

示例(在 UTF-8 中):

$string = 
  array(utf8_decode("Düsseldorf"), // Deliberately produce broken string
        "Washington",
        "Nairobi"); 

print_r(json_encode($string));

Results in

结果是

[null,"Washington","Nairobi"]

Desired result:

想要的结果:

["D?sseldorf","Washington","Nairobi"]

Note: I am notlooking to make broken strings work in json_encode(). I am looking for ways to make it easier to diagnose encoding errors. A nullstring isn't helpful for that.

注意:我希望在 json_encode() 中使损坏的字符串起作用。我正在寻找更容易诊断编码错误的方法。一个null字符串是不是该有所帮助。

回答by goat

php does try to spew an error, but only if you turn display_errors off. This is odd because the display_errorssetting is only meant to control whether or not errors are printed to standard output, not whether or not an error is triggered. I want to emphasize that when you have display_errorson, even though you may see all kinds of other php errors, php doesn't just hide this error, it will not even trigger it. That means it will not show up in any error logs, nor will any custom error_handlers get called. The error just never occurs.

php 确实会尝试抛出错误,但前提是您关闭 display_errors。这很奇怪,因为该display_errors设置仅用于控制是否将错误打印到标准输出,而不是是否触发错误。我想强调的是,当您display_errors打开时,即使您可能会看到各种其他 php 错误,php 不仅会隐藏此错误,它甚至不会触发它。这意味着它不会出现在任何错误日志中,也不会调用任何自定义的 error_handlers。错误永远不会发生。

Here's some code that demonstrates this:

下面是一些演示这一点的代码:

error_reporting(-1);//report all errors
$invalid_utf8_char = chr(193);

ini_set('display_errors', 1);//display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());//nothing

ini_set('display_errors', 0);//do not display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());// json_encode(): Invalid UTF-8 sequence in argument

That bizarre and unfortunate behavior is related to this bug https://bugs.php.net/bug.php?id=47494and a few others, and doesn't look like it will ever be fixed.

这种奇怪而不幸的行为与此错误https://bugs.php.net/bug.php?id=47494和其他一些错误有关,并且看起来永远不会被修复。

workaround:

解决方法:

Cleaning the string before passing it to json_encode may be a workable solution.

在将字符串传递给 json_encode 之前清理字符串可能是一个可行的解决方案。

$stripped_of_invalid_utf8_chars_string = iconv('UTF-8', 'UTF-8//IGNORE', $orig_string);
if ($stripped_of_invalid_utf8_chars_string !== $orig_string) {
    // one or more chars were invalid, and so they were stripped out.
    // if you need to know where in the string the first stripped character was, 
    // then see http://stackoverflow.com/questions/7475437/find-first-character-that-is-different-between-two-strings
}
$json = json_encode($stripped_of_invalid_utf8_chars_string);

http://php.net/manual/en/function.iconv.php

http://php.net/manual/en/function.iconv.php

The manual says

手册上说

//IGNOREsilently discards characters that are illegal in the target charset.

//IGNORE静默丢弃目标字符集中非法的字符。

So by first removing the problematic characters, in theory json_encode() shouldnt get anything it will choke on and fail with. I haven't verified that the output of iconv with the //IGNOREflag is perfectly compatible with json_encodes notion of what valid utf8 characters are, so buyer beware...as there may be edge cases where it still fails. ugh, I hate character set issues.

因此,通过首先删除有问题的字符,理论上 json_encode() 不应该得到任何它会窒息和失败的东西。我还没有验证带有//IGNORE标志的 iconv 的输出与有效 utf8 字符是什么的 json_encodes 概念完全兼容,所以买家要当心......因为可能存在仍然失败的边缘情况。呃,我讨厌字符集问题。

Edit
in php 7.2+, there seems to be some new flags for json_encode: JSON_INVALID_UTF8_IGNOREand JSON_INVALID_UTF8_SUBSTITUTE
There's not much documentation yet, but for now, this test should help you understand expected behavior: https://github.com/php/php-src/blob/master/ext/json/tests/json_encode_invalid_utf8.phpt


在 php 7.2+ 中编辑,似乎有一些新的标志json_encodeJSON_INVALID_UTF8_IGNORE并且JSON_INVALID_UTF8_SUBSTITUTE
还没有太多文档,但是现在,这个测试应该可以帮助您了解预期的行为:https: //github.com/php/php-src/blob /master/ext/json/tests/json_encode_invalid_utf8.phpt

And, in php 7.3+ there's the new flag JSON_THROW_ON_ERROR. See http://php.net/manual/en/class.jsonexception.php

而且,在 php 7.3+ 中有新的 flag JSON_THROW_ON_ERROR。见http://php.net/manual/en/class.jsonexception.php

回答by moubi

$s = iconv('UTF-8', 'UTF-8//IGNORE', $s);

This solved the problem. I am not sure why the guys from php haven't made the life easier by fixing json_encode().

这解决了问题。我不知道为什么 php 的人没有通过修复json_encode().

Anyway using the above allows json_encode() to create object even if the data contains special characters (swedish letters for example).

无论如何使用上面的允许 json_encode() 创建对象,即使数据包含特殊字符(例如瑞典字母)。

You can then use the result in javascript without the need of decoding the data back to its original encoding (with escape(), unescape(), encodeURIComponent(), decodeURIComponent());

然后,您可以在 javascript 中使用结果,而无需将数据解码回其原始编码(使用escape(), unescape(), encodeURIComponent(), decodeURIComponent());

I am using it like this in php (smarty):

我在 php (smarty) 中这样使用它:

$template = iconv('UTF-8', 'UTF-8//IGNORE', $screen->fetch("my_template.tpl"));

Then I am sending the result to javascript and just innerHTMLthe ready template (html peace) in my document.

然后我将结果发送到 javascript 和innerHTML我的文档中的准备好的模板(html 和平)。

Simply said above line should be implemented in json_encode()somehow in order to allow it to work with any encoding.

简单地说,上面的行应该以json_encode()某种方式实现,以允许它使用任何编码。

回答by Danack

This function will remove all invalid UTF8 chars from a string:

此函数将从字符串中删除所有无效的 UTF8 字符:

function removeInvalidChars( $text) {
    $regex = '/( [\x00-\x7F] | [\xC0-\xDF][\x80-\xBF] | [\xE0-\xEF][\x80-\xBF]{2} | [\xF0-\xF7][\x80-\xBF]{3} ) | ./x';
    return preg_replace($regex, '', $text);
}

I use it after converting an Excel document to json, as Excel docs aren't guaranteed to be in UTF8.

我在将 Excel 文档转换为 json 后使用它,因为不能保证 Excel 文档是 UTF8。

I don't think there's a particularly sensible way of converting invalid chars to a visible but valid character. You could replace invalid chars with U+FFFD which is the unicode replacement characterby turning the regex above around, but that really doesn't provide a better user experience than just dropping invalid chars.

我认为没有一种特别明智的方法可以将无效字符转换为可见但有效的字符。您可以使用 U+FFFD 替换无效字符,U+FFFD 是 unicode替换字符,方法是通过翻转上面的正则表达式,但这并不能提供比仅仅删除无效字符更好的用户体验。

回答by metamatt

You need to know the encoding of all strings you're dealing with, or you're entering a world of pain.

您需要知道您正在处理的所有字符串的编码,否则您将进入一个痛苦的世界。

UTF-8 is an easy encoding to use. Also, JSON is defined to use UTF-8 (http://www.json.org/JSONRequest.html). So why not use it?

UTF-8 是一种易于使用的编码。此外,JSON 被定义为使用 UTF-8 (http://www.json.org/JSONRequest.html)。那么为什么不使用它呢?

Short answer: the way to avoid json_encode() dropping your strings is to make sure they are valid UTF-8.

简短回答:避免 json_encode() 丢弃字符串的方法是确保它们是有效的 UTF-8。

回答by CR7

Instead of using the iconv function, you can direclty use the json_encode with the JSON_UNESCAPED_UNICODE option ( >= PHP5.4.0 )

您可以直接使用 json_encode 和 JSON_UNESCAPED_UNICODE 选项( >= PHP5.4.0 ),而不是使用 iconv 函数

Make sure you put "charset=utf-8" in the header of your php file:

确保将 "charset=utf-8" 放在 php 文件的标题中:

header('Content-Type: application/json; charset=utf-8');

header('Content-Type: application/json; charset=utf-8');

回答by Grain

to get a informational error notification on json failures we use this helper:

要获取有关 json 失败的信息性错误通知,我们使用此帮助程序:

  • installs temporarily a custom error handler to catch json errors for encoding/decoding
  • throws RuntimeException on error
  • 临时安装自定义错误处理程序以捕获 json 错误以进行编码/解码
  • 出错时抛出 RuntimeException
<?php

/**
 * usage:
 * $json = HelperJson::encode(['bla'=>'foo']);
 * $array = HelperJson::decode('{"bla":"foo"}');
 * 
 * throws exception on failure
 * 
 */
class HelperJson {

    /**
     * @var array
     */
    static private $jsonErrors = [
            JSON_ERROR_NONE => '',
            JSON_ERROR_UTF8 => 'Malformed UTF-8 characters, possibly incorrectly encoded',
            JSON_ERROR_DEPTH => 'Maximum stack depth exceeded',
            JSON_ERROR_STATE_MISMATCH => 'Underflow or the modes mismatch',
            JSON_ERROR_CTRL_CHAR => 'Unexpected control character found',
            JSON_ERROR_SYNTAX => 'Syntax error, malformed JSON',
    ];

    /**
     * ! assoc ! (reverse logic to php function)
     * @param string $jsonString
     * @param bool $assoc
     * @throws RuntimeException
     * @return array|null
     */
    static public function decode($jsonString, $assoc=true){

        HelperJson_ErrorHandler::reset(); // siehe unten
        set_error_handler('HelperJson_ErrorHandler::handleError');

        $result = json_decode($jsonString, $assoc);

        $errStr = HelperJson_ErrorHandler::getErrstr();
        restore_error_handler();

        $jsonError = json_last_error();
        if( $jsonError!=JSON_ERROR_NONE ) {
            $errorMsg = isset(self::$jsonErrors[$jsonError]) ? self::$jsonErrors[$jsonError] : 'unknown error code: '.$jsonError;
            throw new \RuntimeException('json decoding error: '.$errorMsg.' JSON: '.substr($jsonString,0, 250));
        }
        if( $errStr!='' ){
            throw new \RuntimeException('json decoding problem: '.$errStr.' JSON: '.substr($jsonString,0, 250));
        }
        return $result;
    }

    /**
     * encode with error "throwing"
     * @param mixed $data
     * @param int $options   $options=JSON_PRESERVE_ZERO_FRACTION+JSON_UNESCAPED_SLASHES : 1024 + 64 = 1088
     * @return string
     * @throws \RuntimeException
     */
    static public function encode($data, $options=1088){

        HelperJson_ErrorHandler::reset();// scheint notwendg da sonst bei utf-8 problemen nur eine warnung geflogen ist, die hier aber nicht durchschlug, verdacht der error handler macht selbst was mit json und reset damit json_last_error
        set_error_handler('HelperJson_ErrorHandler::handleError');

        $result = json_encode($data, $options);

        $errStr = HelperJson_ErrorHandler::getErrstr();
        restore_error_handler();

        $jsonError = json_last_error();
        if( $jsonError!=JSON_ERROR_NONE ){
            $errorMsg = isset(self::$jsonErrors[$jsonError]) ? self::$jsonErrors[$jsonError] : 'unknown error code: '.$jsonError;
            throw new \RuntimeException('json encoding error: '.$errorMsg);
        }
        if( $errStr!='' ){
            throw new \RuntimeException('json encoding problem: '.$errStr);
        }
        return $result;
    }

}

/**

HelperJson_ErrorHandler::install();
preg_match('~a','');
$errStr = HelperJson_ErrorHandler::getErrstr();
HelperJson_ErrorHandler::remove();

 *
 */
class HelperJson_ErrorHandler {

    static protected  $errno = 0;
    static protected  $errstr = '';
    static protected  $errfile = '';
    static protected  $errline = '';
    static protected  $errcontext = array();

    /**
     * @param int $errno
     * @param string $errstr
     * @param string $errfile
     * @param int $errline
     * @param array $errcontext
     * @return bool
     */
    static public function handleError($errno, $errstr, $errfile, $errline, $errcontext){
        self::$errno = $errno;
        self::$errstr = $errstr;
        self::$errfile = $errfile;
        self::$errline = $errline;
        self::$errcontext = $errcontext;
        return true;
    }

    /**
     * @return int
     */
    static public function getErrno(){
        return self::$errno;
    }
    /**
     * @return int
     */
    static public function getErrstr(){
        return self::$errstr;
    }
    /**
     * @return int
     */
    static public function getErrfile(){
        return self::$errfile;
    }
    /**
     * @return int
     */
    static public function getErrline(){
        return self::$errline;
    }
    /**
     * @return array
     */
    static public function getErrcontext(){
        return self::$errcontext;
    }
    /**
     * reset last error
     */
    static public function reset(){
        self::$errno = 0;
        self::$errstr = '';
        self::$errfile = '';
        self::$errline = 0;
        self::$errcontext = array();
    }

    /**
     * set black-hole error handler
     */
    static public function install(){
        self::reset();
        set_error_handler('HelperJson_ErrorHandler::handleError');
    }

    /**
     * restore previous error handler
     */
    static function remove(){
        restore_error_handler();
    }
}