如何在 mysql 或 php 中将 'u00e9' 转换为 utf8 字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7061339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert 'u00e9' into a utf8 char, in mysql or php?
提问by carpii
Im doing some data cleansing on some messy data which is being imported into mysql.
我正在对导入 mysql 的一些杂乱数据进行一些数据清理。
The data contains 'pseudo' unicode chars, which are actually embedded into the strings as 'u00e9' etc.
数据包含“伪”unicode 字符,它们实际上作为“u00e9”等嵌入到字符串中。
So one field might be.. 'Jalostotitlu00e1n' I need to rip out that clumsy 'u00e1n' and replace it with the corresponding utf character
所以一个字段可能是..'Jalostotitlu00e1n'我需要撕掉那个笨拙的'u00e1n'并用相应的utf字符替换它
I can do this in either mysql, using substring and CHR maybe, but Im preprocssing the data via PHP, so I could do it there also.
我可以在任一 mysql 中执行此操作,可能使用 substring 和 CHR,但我通过 PHP 预处理数据,因此我也可以在那里执行此操作。
I already know all about how to configure mysql and php to work with utf data. The problem is really just in the source data Im importing.
我已经知道如何配置 mysql 和 php 以处理 utf 数据。问题实际上只是在导入的源数据中。
Thanks
谢谢
采纳答案by rabudde
There's a way. Replace all uXXXX
with their HTML representation and do an html_entity_decode()
有办法。uXXXX
用它们的 HTML 表示替换 all并执行html_entity_decode()
I.e. echo html_entity_decode("Jalostotitlán");
IE echo html_entity_decode("Jalostotitlán");
Every UTF character in the form u1234
could be printed in HTML as ሴ
. But doing a replace is quite hard, because there could be much false positives if there is no other char that identifies the beginning of an UTF sequence. A simple regex could be
表单中的每个 UTF 字符u1234
都可以在 HTML 中打印为ሴ
. 但是进行替换非常困难,因为如果没有其他字符来标识 UTF 序列的开头,则可能会有很多误报。一个简单的正则表达式可能是
preg_replace('/u([\da-fA-F]{4})/', '&#x\1;', $str)
preg_replace('/u([\da-fA-F]{4})/', '&#x\1;', $str)
回答by Sergio-MA-Brazil
/* Function php for convert utf8 html to ansi */
/* 函数 php 用于将 utf8 html 转换为 ansi */
public static function Utf8_ansi($valor='') {
$utf8_ansi2 = array(
"\u00c0" =>"à",
"\u00c1" =>"á",
"\u00c2" =>"?",
"\u00c3" =>"?",
"\u00c4" =>"?",
"\u00c5" =>"?",
"\u00c6" =>"?",
"\u00c7" =>"?",
"\u00c8" =>"è",
"\u00c9" =>"é",
"\u00ca" =>"ê",
"\u00cb" =>"?",
"\u00cc" =>"ì",
"\u00cd" =>"í",
"\u00ce" =>"?",
"\u00cf" =>"?",
"\u00d1" =>"?",
"\u00d2" =>"ò",
"\u00d3" =>"ó",
"\u00d4" =>"?",
"\u00d5" =>"?",
"\u00d6" =>"?",
"\u00d8" =>"?",
"\u00d9" =>"ù",
"\u00da" =>"ú",
"\u00db" =>"?",
"\u00dc" =>"ü",
"\u00dd" =>"Y",
"\u00df" =>"?",
"\u00e0" =>"à",
"\u00e1" =>"á",
"\u00e2" =>"a",
"\u00e3" =>"?",
"\u00e4" =>"?",
"\u00e5" =>"?",
"\u00e6" =>"?",
"\u00e7" =>"?",
"\u00e8" =>"è",
"\u00e9" =>"é",
"\u00ea" =>"ê",
"\u00eb" =>"?",
"\u00ec" =>"ì",
"\u00ed" =>"í",
"\u00ee" =>"?",
"\u00ef" =>"?",
"\u00f0" =>"e",
"\u00f1" =>"?",
"\u00f2" =>"ò",
"\u00f3" =>"ó",
"\u00f4" =>"?",
"\u00f5" =>"?",
"\u00f6" =>"?",
"\u00f8" =>"?",
"\u00f9" =>"ù",
"\u00fa" =>"ú",
"\u00fb" =>"?",
"\u00fc" =>"ü",
"\u00fd" =>"y",
"\u00ff" =>"?");
return strtr($valor, $utf8_ansi2);
}
回答by Theo
My twitter timeline script returns the special characters like é into \u00e9 so I stripped the backslash and used @rubbude his preg_replace.
我的推特时间线脚本将 é 之类的特殊字符返回到 \u00e9 中,所以我去掉了反斜杠并使用了 @rubbude 他的 preg_replace。
// Fix uxxxx charcoding to html
$text = "De #Haarstichting is h\u00e9t medium voor alles Into: De #Haarstichting is hét medium voor alles";
$str = str_replace('\u','u',$text);
$str_replaced = preg_replace('/u([\da-fA-F]{4})/', '&#x;', $str);
echo $str_replaced;
It workes for me and it turns:
De #Haarstichting is h\u00e9t medium voor alles
Into:
De #Haarstichting is hét medium voor alles
它对我有用,它变成
De #Haarstichting is h\u00e9t medium voor alles
:
De #Haarstichting is hét medium voor alles