如何使用python将utf-8字符正确插入到MySQL表中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14811303/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to correctly insert utf-8 characters into a MySQL table using python
提问by user1464409
I am extremely confused and puzzled by how I store strings with unusual characters (to someone who is used to dealing with a UK English character set) in them.
我对如何在其中存储带有不寻常字符的字符串(对于习惯于处理英国英语字符集的人)感到非常困惑和困惑。
Here is my example.
这是我的例子。
I have this name: Bient?t l'été
我有这个名字: Bient?t l'été
This is how I created my table:
这就是我创建表的方式:
CREATE TABLE MyTable(
'my_id' INT(10) unsigned NOT NULL,
'my_name' TEXT CHARACTER SET utf8 NOT NULL,
PRIMARY KEY(`my_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Using this simplified python script I am trying to insert the string into a MySQL database and table:
使用这个简化的 python 脚本,我试图将字符串插入到 MySQL 数据库和表中:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import MySQLdb
mystring = "Bient?t l'été"
myinsert = [ { "name" : mystring.encode("utf-8").strip()[:65535], "id" : 1 } ]
con = None
con = MySQLdb.connect('localhost', 'abc', 'def', 'ghi');
cur = con.cursor()
sql = "INSERT INTO 'MyTable' ( 'my_id', 'my_name' ) VALUES ( %(id)s, %(name)s ) ; "
cur.executemany( sql, myinsert )
con.commit()
if con: con.close()
If I then try to read the name in the database it is stored as: Bient?′t l'??t??
如果我然后尝试读取数据库中的名称,它将存储为: Bient?′t l'??t??
I want it to read: Bient?t l'été
我想让它读: Bient?t l'été
How do I get the python script/MySQL database to do this? I think this is something to do with the character set and how it is set but I can't find a simple web page that explains this without any technical jargon. I've been struggling with this for hours!
如何让 python 脚本/MySQL 数据库执行此操作?我认为这与字符集及其设置方式有关,但我找不到一个没有任何技术术语的简单网页来解释这一点。我已经为此苦苦挣扎了几个小时!
I have looked at this and I see character_set_serveris set as latin1but I don't know if this is the problem or how to change it:
我看过这个,我看到character_set_server设置为latin1但我不知道这是否是问题或如何更改它:
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
采纳答案by Adem ?zta?
Did you try, this query set names utf8;
你试过吗,这个查询 set names utf8;
#!/usr/bin/python
# -*- coding: utf-8 -*-
import MySQLdb
mystring = "Bient?t l'été"
myinsert = [{ "name": mystring.encode("utf-8").strip()[:65535], "id": 1 }]
con = MySQLdb.connect('localhost', 'abc', 'def', 'ghi');
cur = con.cursor()
cur.execute("set names utf8;") # <--- add this line,
sql = "INSERT INTO 'MyTable' ( 'my_id', 'my_name' ) VALUES ( %(id)s, %(name)s ) ; "
cur.executemany( sql, myinsert )
con.commit()
if con: con.close()
回答by Martijn Pieters
Your problem is with how you displaythe data when you read it from the database. You are looking at UTF-8 data mis-interpreted as Latin 1.
您的问题在于从数据库中读取数据时如何显示数据。您正在查看被错误解释为拉丁文 1 的 UTF-8 数据。
>>> "Bient\xf4t l'\xe9t\xe9"
"Bient?t l'été"
>>> "Bient\xf4t l'\xe9t\xe9".encode('utf8').decode('latin1')
"Bient?′t l'??t??"
The above encoded a unicodestring to UTF-8, then misinterprets it as Latin 1 (ISO 8859-1), and the ?and écodepoints, which were encoded to two UTF-8 bytes each, are re-interpreted as two latin-1 code points each.
上面将一个unicode字符串编码为 UTF-8,然后将其误解为 Latin 1 (ISO 8859-1),并且每个被编码为两个 UTF-8 字节的?和é代码点被重新解释为两个 latin-1 代码点每个。
Since you are running Python 2, you shouldn't need to .encode()already encoded data. It'd be better if you inserted unicodeobjects instead; so you want to decodeinstead:
由于您正在运行 Python 2,因此您不需要.encode()已经对数据进行编码。如果你插入unicode对象会更好; 所以你想解码:
myinsert = [ { "name" : mystring.decode("utf-8").strip()[:65535], "id" : 1 } ]
By calling .encode()on the encoded data, you are asking Python to first decodethe data (using the default encoding) so that it then can encode for you. If the default on your python has been changed to latin1you would see the same effect; UTF-8 data interpreted as Latin 1 before being re-encoded to Latin-1.
通过调用.encode()编码数据,您要求 Python 首先解码数据(使用默认编码),以便它可以为您编码。如果您的python 上的默认设置已更改为latin1您将看到相同的效果;UTF-8 数据在被重新编码为 Latin-1 之前被解释为 Latin 1。
You may want to read up on Python and Unicode:
您可能想阅读 Python 和 Unicode:
Pragmatic Unicodeby Ned Batchelder
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)by Joel Spolsky
内德巴切尔德的实用 Unicode
每个软件开发人员绝对、肯定地必须了解 Unicode 和字符集的绝对最低要求(没有任何借口!)作者:Joel Spolsky
回答by Naeem
<?php
//Set Beginning of php code:
header("Content-Type: text/html; charset=UTF-8");
mysql_query("SET NAMES 'utf8'");
mysql_query('SET CHARACTER SET utf8');
//then create the connection
$CNN=mysql_connect("localhost","usr_urdu","123") or die('Unable to Connect');
$DB=mysql_select_db('db_urdu',$CNN)or die('Unable to select DB');
回答by Iman Marashi
Set the default client character set:
设置默认客户端字符集:
<?php
$con=mysqli_connect("localhost","my_user","my_password","my_db");
// Check connection
if (mysqli_connect_errno())
{
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}
// Change character set to utf8
mysqli_set_charset($con,"utf8");
mysqli_close($con);
?>

