MongoDB PHP UTF-8 问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5920626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MongoDB PHP UTF-8 problems
提问by elslooo
Assume that I need to insert the following document:
假设我需要插入以下文档:
{
title: 'Péter'
}
(note the é)
(注意é)
It gives me an error when I use the following PHP-code ... :
当我使用以下 PHP 代码时,它给了我一个错误...:
$db->collection->insert(array("title" => "Péter"));
... because it needs to be utf-8.
...因为它需要是 utf-8。
So I should use this line of code:
所以我应该使用这行代码:
$db->collection->insert(array("title" => utf8_encode("Péter")));
Now, when I request the document, I still have to decode it ... :
现在,当我请求文档时,我仍然需要对其进行解码......:
$document = $db->collection->findOne(array("_id" => new MongoId("__someID__")));
$title = utf8_decode($document['title']);
Is there some way to automate this process? Can I change the character-encoding of MongoDB (I'm migrating a MySQL-database that's using cp1252 West Europe (latin1)?
有没有办法自动化这个过程?我可以更改 MongoDB 的字符编码吗(我正在迁移一个使用 cp1252 West Europe (latin1) 的 MySQL 数据库?
I already considered changing the Content-Type-header, problem is that all static strings (hardcoded) aren't utf8...
我已经考虑过更改 Content-Type-header,问题是所有静态字符串(硬编码)都不是 utf8 ...
Thanks in advance! Tim
提前致谢!蒂姆
回答by Alix Axel
JSON and BSON can only encode / decode valid UTF-8 strings, if your data (included input) is not UTF-8 you need to convert it before passing it to any JSON dependent system, like this:
JSON 和 BSON 只能编码/解码有效的 UTF-8 字符串,如果您的数据(包括输入)不是 UTF-8,您需要在将其传递给任何 JSON 依赖系统之前对其进行转换,如下所示:
$string = iconv('UTF-8', 'UTF-8//IGNORE', $string); // or
$string = iconv('UTF-8', 'UTF-8//TRANSLIT', $string); // or even
$string = iconv('UTF-8', 'UTF-8//TRANSLIT//IGNORE', $string); // not sure how this behaves
Personally I prefer the first option, see the iconv()
manual page. Other alternatives include:
我个人更喜欢第一个选项,请参阅iconv()
手册页。其他替代方案包括:
mb_convert_encoding()
utf8_encode(utf8_decode($string))
mb_convert_encoding()
utf8_encode(utf8_decode($string))
You should always make sure your strings are UTF-8 encoded, even the user-submitted ones, however since you mentioned that you're migrating from MySQL to MongoDB, have you tried exporting your current database to CSV and using the import scripts that come with Mongo? They should handle this...
您应该始终确保您的字符串是 UTF-8 编码的,即使是用户提交的字符串,但是既然您提到要从 MySQL 迁移到 MongoDB,您是否尝试将当前数据库导出到 CSV 并使用附带的导入脚本与蒙戈?他们应该处理这个...
EDIT:I mentioned that BSON can only handle UTF-8, but I'm not sure if this is exactly true, I have a vague idea that BSON uses UTF-16 or UTF-32 to encode / decode data, but I can't check now.
编辑:我提到 BSON 只能处理 UTF-8,但我不确定这是否完全正确,我有一个模糊的想法,即 BSON 使用 UTF-16 或 UTF-32 来编码/解码数据,但我不能现在检查。
回答by Adam Monsen
As @gates said, all string data in BSON is encoded as UTF-8. MongoDB assumes this.
正如@gates 所说,BSON 中的所有字符串数据都被编码为 UTF-8。MongoDB 假设了这一点。
Another key point which neither answer addresses: PHP is not Unicode aware. As of 5.3, anyway. PHP 6 will supposedly be Unicode-aware. What this means is you have to know what encoding is used by your operating system by default and what encoding PHP is using.
两个都没有回答的另一个关键点:PHP is not Unicode known。无论如何,从 5.3 开始。PHP 6 应该可以识别 Unicode。这意味着您必须知道默认情况下您的操作系统使用什么编码以及 PHP 使用什么编码。
Let's get back to your original question: "Is there some way to automate this process?" ... my suggestion is to make sure you are always using UTF-8 throughout your application. Configuration, input, data storage, presentation, everything. Then the "automated" part is that most of your PHP code will be simpler since it always assumes UTF-8. No conversions necessary. Heck, nobody said automation was cheap. :)
让我们回到您最初的问题:“有什么方法可以使这个过程自动化?” ...我的建议是确保在整个应用程序中始终使用 UTF-8。配置、输入、数据存储、演示,应有尽有。那么“自动化”部分是你的大部分 PHP 代码会更简单,因为它总是假设 UTF-8。无需转换。哎呀,没有人说自动化很便宜。:)
Here's kind of an aside. If you created a little PHP script to test that insert()
code, figure out what encoding your file is, then convert to UTF-8 before inserting. For example, if you know the file is ISO-8859-1, try this:
这是一个旁白。如果您创建了一个小 PHP 脚本来测试该insert()
代码,请弄清楚您的文件是什么编码,然后在插入之前转换为 UTF-8。例如,如果您知道文件是 ISO-8859-1,请尝试以下操作:
$title = mb_convert_encoding("Péter", "UTF-8", "ISO-8859-1");
$db->collection->insert(array("title" => $title));
See also
也可以看看
回答by Gates VP
Can I change the character-encoding of MongoDB...
我可以更改 MongoDB 的字符编码吗...
No data is stored in BSON. According to the BSON spec, all string are UTF-8.
BSON 中不存储任何数据。根据BSON 规范,所有字符串都是 UTF-8。
Now, when I request the document, I still have to decode it ... : Is there some way to automate this process?
现在,当我请求文档时,我仍然需要对其进行解码...:有什么方法可以使这个过程自动化?
It sounds like you are trying to output the data to web page. Needing to "decode" text that was already encoded seems incorrect.
听起来您正在尝试将数据输出到网页。需要“解码”已经编码的文本似乎不正确。
Could this output problem be a configuration issue with Apache+PHP? UTF8+PHP is not automatic, a quick online search brought up several tutorials on this topic.
这个输出问题可能是 Apache+PHP 的配置问题吗?UTF8+PHP 不是自动的,快速的在线搜索带来了关于这个主题的几个教程。