php 将 UTF8 表上的 latin1 字符转换为 UTF8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9407834/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-26 06:44:48  来源:igfitidea点击:

Convert latin1 characters on a UTF8 table into UTF8

phpmysqlutf-8character-encodingiso-8859-1

提问by Nuno

Only today I realized that I was missing this in my PHP scripts:

直到今天,我才意识到我的 PHP 脚本中缺少这一点:

mysql_set_charset('utf8');

All my tables are InnoDB, collation "utf8_unicode_ci", and all my VARCHAR columns are "utf8_unicode_ci" as well. I have mb_internal_encoding('UTF-8');on my PHP scripts, and all my PHP files are encoded as UTF-8.

我所有的表都是 InnoDB,整理“utf8_unicode_ci”,我所有的 VARCHAR 列也是“utf8_unicode_ci”。我有mb_internal_encoding('UTF-8');我的 PHP 脚本,我所有的 PHP 文件都编码为 UTF-8。

So, until now, every time I "INSERT" something with diacritics, example:

所以,直到现在,每次我用变音符号“插入”一些东西时,例如:

mysql_query('INSERT INTO `table` SET `name`="Jáuò I?e"');

The 'name' contents would be, in this case: J??u?2 I?±e.

在这种情况下,“名称”内容将是:J??u?2 I?±e.

Since I fixed the charset between PHP and MySQL, new INSERTs are now storing correctly. However, I want to fix all the older rows that are "messed" at the moment. I tried many things already, but it always breaks the strings on the first "illegal" character. Here is my current code:

由于我修复了 PHP 和 MySQL 之间的字符集,新的 INSERT 现在可以正确存储。但是,我想修复目前“混乱”的所有旧行。我已经尝试了很多东西,但它总是打破第一个“非法”字符的字符串。这是我当前的代码:

$m = mysql_real_escape_string('?<?php echo "?<b>\'PHP &aacute; (á)??ri?? </b>"; ?> ?-?i abcdd;//;??′??????????aξβψδπλξξ?α??? ;');
mysql_set_charset('utf8');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('latin1');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('utf8');

$result = mysql_iquery('SELECT * FROM `table`');
while ($row = mysql_fetch_assoc($result)) {
    $message = $row['name'];
    $message = mb_convert_encoding($message, 'ISO-8859-15', 'UTF-8');
    //$message = iconv("UTF-8", "ISO-8859-1//IGNORE", $message);
    mysql_iquery('UPDATE `table` SET `name`="'.mysql_real_escape_string($message).'" WHERE `a1`="'.$row['a1'].'"');
}

It "UPDATE"s with the expected characters, except that the string gets truncated after the character "?". I mean, that character and following chars are not included on the string.

它使用预期的字符“更新”,除了字符串在字符“?”之后被截断。我的意思是,该字符和后面的字符不包含在字符串中。

Also, testing with the "iconv()" (that is commented on the code) does the same, even with //IGNORE and //TRANSLIT

此外,即使使用 //IGNORE 和 //TRANSLIT,使用“iconv()”(即在代码中注释)的测试也是如此

I also tested several charsets, between ISO-8859-1 and ISO-8859-15.

我还测试了几个字符集,介于 ISO-8859-1 和 ISO-8859-15 之间。

回答by ABS

From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like

根据您的描述,您似乎拥有最初存储为 Latin-1 的 UTF-8 数据,然后没有正确转换为 UTF-8。数据是可恢复的;你需要一个 MySQL 函数,比如

convert(cast(convert(name using  latin1) as binary) using utf8)

It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.

您可能需要省略内部转换,具体取决于在编码转换期间数据的更改方式。

回答by Marcel Grolms

After i searched about an hour or two for this answer. I needed to migrate a old tt_news db from typo into a new typo3 version. I already tried convert the charset in the export file and import it back, but didn't get it working.

在我搜索了大约一两个小时后,这个答案。我需要将旧的 tt_news 数据库从 Typ 迁移到新的 Typ3 版本。我已经尝试转换导出文件中的字符集并将其导入回来,但没有让它工作。

Then i tried the answer above from ABS and startet a update on the table:

然后我尝试了上面来自 ABS 的答案并开始在桌子上更新:

UPDATE tt_news SET 
    title=convert(cast(convert(title using  latin1) as binary) using utf8), 
    short=convert(cast(convert(short using  latin1) as binary) using utf8), 
    bodytext=convert(cast(convert(bodytext using  latin1) as binary) using utf8)
WHERE 1

You can also convert imagecaption, imagealttext, imagetitletext and keywords if needed. Hope this will help somebody migrating tt_news to new typo3 version.

如果需要,您还可以转换 imagecaption、imagealttext、imagetitletext 和关键字。希望这能帮助某人将 tt_news 迁移到新的 Typ3 版本。

回答by hussien

the way is better way use connection tow you database normal

方法是更好的方法使用连接拖你数据库正常

then use this code to make what you need you must make your page encoding utf-8 by meta in header cod html (dont forget this)

然后使用此代码来制作您需要的内容,您必须在标题 cod html 中通过 meta 对页面进行 utf-8 编码(不要忘记这一点)

then use this code

然后使用此代码

    $result = mysql_query('SELECT * FROM shops');
    while ($row = mysql_fetch_assoc($ 
    $name= iconv("windows-1256", "UTF-8", $row['name']);

   mysql_query("SET NAMES 'utf8'"); 
   mysql_query("update   `shops` SET `name`='".$name."'  where ID='$row[ID]'  ");
    }

回答by burkul

I highly recommend using 'utf8mb4' instead of 'utf8', since utf8 cannot store some chinese characters and emojis.

我强烈建议使用 'utf8mb4' 而不是 'utf8',因为 utf8 不能存储一些中文字符和表情符号。