如何在 mysql 或 php 中将 'u00e9' 转换为 utf8 字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7061339/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-26 01:55:08  来源:igfitidea点击:

How to convert 'u00e9' into a utf8 char, in mysql or php?

phpmysqlunicodeutf

提问by carpii

Im doing some data cleansing on some messy data which is being imported into mysql.

我正在对导入 mysql 的一些杂乱数据进行一些数据清理。

The data contains 'pseudo' unicode chars, which are actually embedded into the strings as 'u00e9' etc.

数据包含“伪”unicode 字符,它们实际上作为“u00e9”等嵌入到字符串中。

So one field might be.. 'Jalostotitlu00e1n' I need to rip out that clumsy 'u00e1n' and replace it with the corresponding utf character

所以一个字段可能是..'Jalostotitlu00e1n'我需要撕掉那个笨拙的'u00e1n'并用相应的utf字符替换它

I can do this in either mysql, using substring and CHR maybe, but Im preprocssing the data via PHP, so I could do it there also.

我可以在任一 mysql 中执行此操作,可能使用 substring 和 CHR,但我通过 PHP 预处理数据,因此我也可以在那里执行此操作。

I already know all about how to configure mysql and php to work with utf data. The problem is really just in the source data Im importing.

我已经知道如何配置 mysql 和 php 以处理 utf 数据。问题实际上只是在导入的源数据中。

Thanks

谢谢

采纳答案by rabudde

There's a way. Replace all uXXXXwith their HTML representation and do an html_entity_decode()

有办法。uXXXX用它们的 HTML 表示替换 all并执行html_entity_decode()

I.e. echo html_entity_decode("Jalostotitlán");

IE echo html_entity_decode("Jalostotitlán");

Every UTF character in the form u1234could be printed in HTML as ሴ. But doing a replace is quite hard, because there could be much false positives if there is no other char that identifies the beginning of an UTF sequence. A simple regex could be

表单中的每个 UTF 字符u1234都可以在 HTML 中打印为ሴ. 但是进行替换非常困难,因为如果没有其他字符来标识 UTF 序列的开头,则可能会有很多误报。一个简单的正则表达式可能是

preg_replace('/u([\da-fA-F]{4})/', '&#x\1;', $str)

preg_replace('/u([\da-fA-F]{4})/', '&#x\1;', $str)

回答by Sergio-MA-Brazil

/* Function php for convert utf8 html to ansi */

/* 函数 php 用于将 utf8 html 转换为 ansi */

public static function Utf8_ansi($valor='') {

    $utf8_ansi2 = array(
    "\u00c0" =>"à",
    "\u00c1" =>"á",
    "\u00c2" =>"?",
    "\u00c3" =>"?",
    "\u00c4" =>"?",
    "\u00c5" =>"?",
    "\u00c6" =>"?",
    "\u00c7" =>"?",
    "\u00c8" =>"è",
    "\u00c9" =>"é",
    "\u00ca" =>"ê",
    "\u00cb" =>"?",
    "\u00cc" =>"ì",
    "\u00cd" =>"í",
    "\u00ce" =>"?",
    "\u00cf" =>"?",
    "\u00d1" =>"?",
    "\u00d2" =>"ò",
    "\u00d3" =>"ó",
    "\u00d4" =>"?",
    "\u00d5" =>"?",
    "\u00d6" =>"?",
    "\u00d8" =>"?",
    "\u00d9" =>"ù",
    "\u00da" =>"ú",
    "\u00db" =>"?",
    "\u00dc" =>"ü",
    "\u00dd" =>"Y",
    "\u00df" =>"?",
    "\u00e0" =>"à",
    "\u00e1" =>"á",
    "\u00e2" =>"a",
    "\u00e3" =>"?",
    "\u00e4" =>"?",
    "\u00e5" =>"?",
    "\u00e6" =>"?",
    "\u00e7" =>"?",
    "\u00e8" =>"è",
    "\u00e9" =>"é",
    "\u00ea" =>"ê",
    "\u00eb" =>"?",
    "\u00ec" =>"ì",
    "\u00ed" =>"í",
    "\u00ee" =>"?",
    "\u00ef" =>"?",
    "\u00f0" =>"e",
    "\u00f1" =>"?",
    "\u00f2" =>"ò",
    "\u00f3" =>"ó",
    "\u00f4" =>"?",
    "\u00f5" =>"?",
    "\u00f6" =>"?",
    "\u00f8" =>"?",
    "\u00f9" =>"ù",
    "\u00fa" =>"ú",
    "\u00fb" =>"?",
    "\u00fc" =>"ü",
    "\u00fd" =>"y",
    "\u00ff" =>"?");

    return strtr($valor, $utf8_ansi2);      

}

回答by Theo

My twitter timeline script returns the special characters like é into \u00e9 so I stripped the backslash and used @rubbude his preg_replace.

我的推特时间线脚本将 é 之类的特殊字符返回到 \u00e9 中,所以我去掉了反斜杠并使用了 @rubbude 他的 preg_replace。

// Fix uxxxx charcoding to html
$text = "De #Haarstichting is h\u00e9t medium voor alles Into:  De #Haarstichting is hét medium voor alles";
$str     = str_replace('\u','u',$text);
$str_replaced = preg_replace('/u([\da-fA-F]{4})/', '&#x;', $str);

echo $str_replaced;

It workes for me and it turns: De #Haarstichting is h\u00e9t medium voor allesInto: De #Haarstichting is hét medium voor alles

它对我有用,它变成 De #Haarstichting is h\u00e9t medium voor allesDe #Haarstichting is hét medium voor alles