javascript 从 PHP 中的 UTF-8 字符串中删除控制字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21284228/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Removing control characters from a UTF-8 string in PHP
提问by MirrorMirror
So I am removing control characters (tab, cr, lf, \v and all other invisible chars) in the client side (after input) but since the client cannot be trusted, I have to remove them in the server too.
所以我在客户端(输入后)删除控制字符(tab、cr、lf、\v 和所有其他不可见字符),但由于客户端不可信,我也必须在服务器中删除它们。
so according to this link http://www.utf8-chartable.de/
所以根据这个链接http://www.utf8-chartable.de/
the control characters are from x00 to 1F and from 7F to 9F. thus my client (javascript) control char removal function is:
控制字符从 x00 到 1F,从 7F 到 9F。因此我的客户端(javascript)控制字符删除功能是:
return s.replace(/[\x00-\x1F\x7F-\x9F]/g, "");
and my php (server) control char removal function is:
我的php(服务器)控制字符删除功能是:
$s = preg_replace('/[\x00-\x1F\x7F-\x9F]/', '', $s);
Now this seems to create problems with international utf8 chars such as ? (xCF x82) in PHP only (because x82 is inside the second sequence group), the javascript equivalent does not create any problems.
现在这似乎给国际 utf8 字符带来了问题,例如 ? (xCF x82) 仅在 PHP 中(因为 x82 在第二个序列组内),javascript 等效项不会产生任何问题。
Now my question is, should I remove the control characters from 7F to 9F? To my understanding those the sequences from 127 to 159 (7F to 9F) obviously can be part of a valid UTF-8 string?
现在我的问题是,我应该将控制字符从 7F 删除到 9F 吗?据我了解,从 127 到 159(7F 到 9F)的序列显然可以是有效 UTF-8 字符串的一部分?
also, maybe I shouldn't even filter the 00 to 31 control characters because also some of those characters can appear in some weird (japanese? chinese?) but valid utf-8 characters ?
另外,也许我什至不应该过滤 00 到 31 个控制字符,因为其中一些字符也可能出现在一些奇怪的(日语?中文?)但有效的 utf-8 字符中?
采纳答案by MirrorMirror
it seems that I just need to add the uflag to the regex thus it becomes:
似乎我只需要在正则表达式中添加u标志,这样它就变成了:
$s = preg_replace('/[\x00-\x1F\x7F-\x9F]/u', '', $s);