database 数据库文本中的奇怪字符:?, ?, ¢, a? ,

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7861358/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 08:22:17  来源:igfitidea点击:

Strange Characters in database text: ?, ?, ¢, a? ,

databasecharacter-encodingprestashop

提问by Steve

I'm not certain when this first occured.

我不确定这第一次发生在什么时候。

I have a new drop-shipping affiliate website, and receive an exported copy of the product catalog from the wholesaler. I format and import this into Prestashop 1.4.4.

我有一个新的直销附属网站,并从批发商那里收到产品目录的导出副本。我将其格式化并将其导入 Prestashop 1.4.4。

The front end of the website contains combinations of strange characters inside product text: ?, ?, ¢, a? etc. They appear in place of common characters like , - : etc.

网站的前端包含产品文本中奇怪字符的组合:?, ?, ¢, a? 等。它们出现在常见字符的位置,例如,-:等。

These characters are present in about 40% of the database tables, not just product specific tables like ps_product_lang.

这些字符出现在大约 40% 的数据库表中,而不仅仅是像 ps_product_lang 这样的产品特定表。

Another website threadsays this same problem occurs when the database connection string uses an incorrect character encoding type.

另一个网站线程,当数据库连接字符串使用不正确的字符编码类型时,会出现同样的问题

In /config/setting.inc, there is no character encoding string mentioned, just the MySQL Engine, which is set to InnoDB, which matches what I see in PHPMyAdmin.

在/config/setting.inc中,没有提到字符编码字符串,只提到了MySQL Engine,它设置为InnoDB,与我在PHPMyAdmin中看到的相匹配。

I exported ps_product_lang, replaced all instances of these characters with correct characters, saved the CSV file in UTF-8 format, and reimported them using PHPMyAdmin, specifying UTF-8 as the language.

我导出 ps_product_lang,用正确的字符替换这些字符的所有实例,以 UTF-8 格式保存 CSV 文件,并使用 PHPMyAdmin 重新导入它们,指定 UTF-8 作为语言。

However, after doing a new search in PHPMyAdmin, I now have about 10 times as many instances of these bad characters in ps_product_lang than I started with.

但是,在 PHPMyAdmin 中进行新的搜索后,现在 ps_product_lang 中这些坏字符的实例数量是我开始时的 10 倍。

If the problem is as simple as specifying the correct language attribute in the database connection string, where/how do I set this, and what to?

如果问题就像在数据库连接字符串中指定正确的语言属性一样简单,我在哪里/如何设置它,以及如何设置?

Incidently, I tried running this command in PHPMyAdmin mentioned in this thread, but the problem remains:

顺便说一句,我尝试在此线程中提到的 PHPMyAdmin 中运行此命令,但问题仍然存在:

SET NAMES utf8

UPDATE: PHPMyAdmin says:

更新:PHPMyAdmin 说:

MySQL charset: UTF-8 Unicode (utf8)

MySQL 字符集:UTF-8 Unicode (utf8)

This is the same character set I used in the last import file, which caused more character corruptions. UTF-8 was specified as the charset of the import file during the import process.

这与我在上一个导入文件中使用的字符集相同,导致更多字符损坏。在导入过程中,UTF-8 被指定为导入文件的字符集。

UPDATE2

更新2

Here is a sample:

这是一个示例:

people are truly living untethered???'???¢???¢?¢a???????????ˉ?¢a??? ???? buying and renting movies online, downloading software, and sharing and storing files on the web.

人们真的生活不受束缚???'???¢???¢?¢a?????????????ˉ?¢a??? ???在线购买和租借电影、下载软件以及在网络上共享和存储文件。

UPDATE3

更新3

I ran an SQL command in PHPMyAdmin to display the character sets:

我在 PHPMyAdmin 中运行了一个 SQL 命令来显示字符集:

  • character_set_client utf8
  • character_set_connection utf8
  • character_set_database latin1
  • character_set_filesystem binary
  • character_set_results utf8
  • character_set_server latin1
  • character_set_system utf8
  • character_set_client utf8
  • character_set_connection utf8
  • character_set_database latin1
  • character_set_filesystem 二进制文件
  • character_set_results utf8
  • character_set_server latin1
  • character_set_system utf8

So, perhaps my database needs to be converted (or deleted and recreated) to UTF-8. Could this pose a problem if the MySQL server is latin1?

因此,也许我的数据库需要转换(或删除并重新创建)为 UTF-8。如果 MySQL 服务器是 latin1,这会造成问题吗?

Can MySQL handle the translation of serving content as UTF8 but storing it as latin1? I don't think it can, as UTF8 is a superset of latin1. My web hosting support has not replied in 48 hours. Might be too hard for them.

MySQL 能否将服务内容的翻译处理为 UTF8,但将其存储为 latin1?我不认为它可以,因为 UTF8 是 latin1 的超集。我的网络托管支持在 48 小时内没有回复。可能对他们来说太难了。

回答by AlexV

If the charset of the tables is the same as it's content try to use mysql_set_charset('UTF8', $link_identifier). Note that MySQL uses UTF8to specify the UTF-8 encoding instead of UTF-8which is more common.

如果表的字符集与其内容相同,请尝试使用mysql_set_charset('UTF8', $link_identifier). 请注意,MySQL 用于UTF8指定 UTF-8 编码,而不是UTF-8更常见的编码。

Check my other answeron a similar question too.

也请查看在类似问题上的其他答案

回答by Aurelio De Rosa

This is surely an encoding problem. You have a different encoding in your database and in your website and this fact is the cause of the problem. Also if you ran that command you have to change the records that are already in your tables to convert those character in UTF-8.

这肯定是一个编码问题。您的数据库和网站中有不同的编码,这个事实是问题的原因。此外,如果您运行该命令,则必须更改表中已有的记录以将这些字符转换为 UTF-8。

Update: Based on your last comment, the core of the problem is that you have a database and a data source (the CSV file) which use different encoding. Hence you can convert your database in UTF-8 or, at least, when you get the data that are in the CSV, you have to convert them from UTF-8 to latin1.

更新:根据您的最后一条评论,问题的核心是您有一个使用不同编码的数据库和一个数据源(CSV 文件)。因此,您可以将数据库转换为 UTF-8,或者至少,当您获得 CSV 中的数据时,您必须将它们从 UTF-8 转换为 latin1。

You can do the convertion following this articles:

您可以按照本文进行转换:

回答by Haisum Usman

Apply these two things.

应用这两件事。

  1. You need to set the character set of your database to be utf8.

  2. You need to call the mysql_set_charset('utf8')in the file where you made the connection with the database and right after the selection of database like mysql_select_dbuse the mysql_set_charset. That will allow you to add and retrieve data properly in whatever the language.

  1. 您需要将数据库的字符集设置为utf8.

  2. 你需要调用的mysql_set_charset('utf8'),你做与数据库和数据库一样的选择权后,连接在文件中mysql_select_db使用mysql_set_charset。这将允许您以任何语言正确添加和检索数据。

回答by Kristoffer Bohmann

This appears to be a UTF-8 encoding issue that may have been caused by a double-UTF8-encoding of the database file contents.

这似乎是一个 UTF-8 编码问题,可能是由数据库文件内容的双 UTF8 编码引起的。

This situation could happen due to factors such as the character set that was or was not selected (for instance when a database backup file was created) and the file format and encoding database file was saved with.

这种情况可能是由于诸如选择或未选择的字符集(例如创建数据库备份文件时)以及保存的文件格式和编码数据库文件等因素造成的。

I have seen these strange UTF-8 characters in the following scenario (the description may not be entirely accurate as I no longer have access to the database in question):

我在以下场景中看到了这些奇怪的 UTF-8 字符(描述可能不完全准确,因为我无法再访问相关数据库):

  • As I recall, there the database and tables had a "uft8_general_ci" collation.
  • Backup is made of the database.
  • Backup file is opened on Windows in UNIX file format and with ANSI encoding.
  • Database is restored on a new MySQL server by copy-pasting the contents from the database backup file into phpMyAdmin.
  • 我记得,那里的数据库和表有一个“uft8_general_ci”排序规则。
  • 备份由数据库组成。
  • 备份文件在 Windows 上以 UNIX 文件格式和 ANSI 编码打开。
  • 通过将数据库备份文件中的内容复制粘贴到 phpMyAdmin 中,可以在新的 MySQL 服务器上恢复数据库。

Looking into the file contents:

查看文件内容:

  • Opening the SQL backup file in a text editor shows that the SQL backup file has strange characters such as "s???¥". On a side note, you may get different results if opening the same file in another editor. I use TextPad here but opening the same file in SublimeText said "s?¥" because SublimeText correctly UTF8-encoded the file -- still, this is a bit confusing when you start trying to fix the issue in PHP because you don't see the right data in SublimeText at first. Anyways, that can be resolved by taking note of which encoding your text editor is using when presenting the file contents.
  • The strange characters are double-encoded UTF-8 characters, so in my case the first "??" part equals "?" and "?¥" = "¥" (this is my first "encoding"). THe "?¥" characters equals the UTF-8 character for "?" (this is my second encoding).
  • 用文本编辑器打开SQL备份文件,发现SQL备份文件有“s???¥”等奇怪字符。附带说明一下,如果在另一个编辑器中打开同一个文件,您可能会得到不同的结果。我在这里使用 TextPad 但在 SublimeText 中打开同一个文件说“s?¥”,因为 SublimeText 正确地对文件进行了 UTF8 编码——不过,当你开始尝试在 PHP 中解决这个问题时,这有点令人困惑,因为你看不到首先是 SublimeText 中的正确数据。无论如何,这可以通过记下您的文本编辑器在呈现文件内容时使用的编码来解决。
  • 奇怪的字符是双重编码的 UTF-8 字符,所以在我的情况下是第一个“??” 部分等于“?” 和“?¥”=“¥”(这是我的第一个“编码”)。“?¥”字符等于“?”的 UTF-8 字符。(这是我的第二个编码)。

So, the issue is that "false" (UTF8-encoded twice) utf-8 needs to be converted back into "correct" utf-8 (only UTF8-encoded once).

因此,问题在于需要将“false”(UTF8 编码两次)utf-8 转换回“正确”utf-8(仅 UTF8 编码一次)

Trying to fix this in PHP turns out to be a bit challenging:

尝试在 PHP 中解决这个问题有点挑战:

utf8_decode() is not able to process the characters.

utf8_decode() 无法处理字符。

// Fails silently (as in - nothing is output)
$str = "s???¥";

$str = utf8_decode($str);
printf("\n%s", $str);

$str = utf8_decode($str);
printf("\n%s", $str);

iconv() fails with "Notice: iconv(): Detected an illegal character in input string".

iconv() 失败并显示“注意:iconv():在输入字符串中检测到非法字符”。

echo iconv("UTF-8", "ISO-8859-1", "s???¥");

Another fine and possible solutionfails silently too in this scenario

在这种情况下,另一个很好且可能的解决方案也默默地失败了

$str = "s???¥";
echo html_entity_decode(htmlentities($str, ENT_QUOTES, 'UTF-8'), ENT_QUOTES , 'ISO-8859-15');

mb_convert_encoding() silently: #

mb_convert_encoding() 静默:#

$str = "s???¥";
echo mb_convert_encoding($str, 'ISO-8859-15', 'UTF-8');
// (No output)

Trying to fix the encoding in MySQL by converting the MySQL database characterset and collation to UTF-8was unsuccessfully:

尝试通过将 MySQL 数据库字符集和排序规则转换为 UTF-8来修复 MySQL 中的编码失败:

ALTER DATABASE myDatabase CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE myTable CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;

I see a couple of ways to resolve this issue.

我看到了几种解决此问题的方法。

The first is to make a backup with correct encoding (the encoding needs to match the actual database and table encoding). You can verify the encoding by simply opening the resulting SQL file in a text editor.

首先是使用正确的编码进行备份(编码需要匹配实际的数据库和表编码)。您可以通过在文本编辑器中打开生成的 SQL 文件来验证编码。

The other is to replace double-UTF8-encoded characters with single-UTF8-encoded characters. This can be done manually in a text editor. To assist in this process, you can manually pick incorrect characters from Try UTF-8 Encoding Debugging Chart(it may be a matter of replacing 5-10 errors).

另一种是用单UTF8编码字符替换双UTF8编码字符。这可以在文本编辑器中手动完成。为了帮助完成这个过程,您可以从 Try UTF-8 Encoding Debugging Chart 中手动选择不正确的字符(可能是替换 5-10 个错误的问题)。

Finally, a script can assist in the process:

最后,脚本可以帮助完成这个过程:

    $str = "s???¥";
    // The two arrays can also be generated by double-encoding values in the first array and single-encoding values in the second array.
    $str = str_replace(["??","?¥"], ["?","¥"], $str); 
    $str = utf8_decode($str);
    echo $str;
    // Output: "s?" (correct)

回答by Pielo

I encountered today quite a similar problem : mysqldump dumped my utf-8 base encoding utf-8 diacritic characters as two latin1 characters, although the file itself is regular utf8.

我今天遇到了一个非常类似的问题:mysqldump 将我的 utf-8 基本编码 utf-8 变音符号转储为两个 latin1 字符,尽管文件本身是常规的 utf8。

For example : "é" was encoded as two characters "??". These two characters correspond to the utf8 two bytes encoding of the letter but it should be interpreted as a single character.

例如:“é”被编码为两个字符“??”。这两个字符对应于字母的utf8 两字节编码,但应该解释为单个字符。

To solve the problem and correctly import the database on another server, I had to convert the file using the ftfy (stands for "Fixes Text For You). (https://github.com/LuminosoInsight/python-ftfy) python library. The library does exactly what I expect : transform bad encoded utf-8 to correctly encoded utf-8.

为了解决问题并在另一台服务器上正确导入数据库,我必须使用 ftfy(代表“为您修复文本”)转换文件。(https://github.com/LuminosoInsight/python-ftfy)python 库。该库完全符合我的期望:将错误编码的 utf-8 转换为正确编码的 utf-8。

For example : This latin1 combination "??" is turned into an "é".

例如:这个latin1组合“??” 变成了“é”。

ftfy comes with a command line script but it transforms the file so it can not be imported back into mysql.

ftfy 带有命令行脚本,但它会转换文件,因此无法将其导入回 mysql。

I wrote a python3 script to do the trick :

我写了一个 python3 脚本来解决这个问题:

#!/usr/bin/python3
# coding: utf-8

import ftfy

# Set input_file
input_file = open('mysql.utf8.bad.dump', 'r', encoding="utf-8")
# Set output file
output_file = open ('mysql.utf8.good.dump', 'w')

# Create fixed output stream
stream = ftfy.fix_file(
    input_file,
    encoding=None,
    fix_entities='auto', 
    remove_terminal_escapes=False, 
    fix_encoding=True, 
    fix_latin_ligatures=False, 
    fix_character_width=False, 
    uncurl_quotes=False, 
    fix_line_breaks=False, 
    fix_surrogates=False, 
    remove_control_chars=False, 
    remove_bom=False, 
    normalization='NFC'
)

# Save stream to output file
stream_iterator = iter(stream)
while stream_iterator:
    try:
        line = next(stream_iterator)
        output_file.write(line)
    except StopIteration:
        break

回答by Achin Kumar

The error usually gets introduced while creation of CSV. Try using Linux for saving the CSV as a TextCSV. Libre Office in Ubuntu can enforce the encoding to be UTF-8, worked for me. I wasted a lot of time trying this on Mac OS. Linux is the key. I've tested on Ubuntu.

创建 CSV 时通常会引入该错误。尝试使用 Linux 将 CSV 保存为 TextCSV。Ubuntu 中的 Libre Office 可以强制编码为 UTF-8,对我有用。我浪费了很多时间在 Mac OS 上尝试这个。Linux 是关键。我在 Ubuntu 上测试过。

Good Luck

祝你好运