python 在 linux 上使用 pyodbc 在 nvarchar mssql 字段中插入 unicode 或 utf-8 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/947077/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:09:02  来源:igfitidea点击:

using pyodbc on linux to insert unicode or utf-8 chars in a nvarchar mssql field

pythonsql-serverunicodeutf-8pyodbc

提问by nosklo

I am using Ubuntu 9.04

我正在使用Ubuntu 9.04

I have installed the following package versions:

我已经安装了以下软件包版本:

unixodbc and unixodbc-dev: 2.2.11-16build3
tdsodbc: 0.82-4
libsybdb5: 0.82-4
freetds-common and freetds-dev: 0.82-4

I have configured /etc/unixodbc.inilike this:

我是这样配置的/etc/unixodbc.ini

[FreeTDS]
Description             = TDS driver (Sybase/MS SQL)
Driver          = /usr/lib/odbc/libtdsodbc.so
Setup           = /usr/lib/odbc/libtdsS.so
CPTimeout               = 
CPReuse         = 
UsageCount              = 2

I have configured /etc/freetds/freetds.conflike this:

我是这样配置的/etc/freetds/freetds.conf

[global]
    tds version = 8.0
    client charset = UTF-8

I have grabbed pyodbc revision 31e2fae4adbf1b2af1726e5668a3414cf46b454ffrom http://github.com/mkleehammer/pyodbcand installed it using "python setup.py install"

我已经抓住pyodbc修订31e2fae4adbf1b2af1726e5668a3414cf46b454fhttp://github.com/mkleehammer/pyodbc,它使用“安装python setup.py install

I have a windows machine with Microsoft SQL Server 2000installed on my local network, up and listening on the local ip address 10.32.42.69. I have an empty database created with name "Common". I have the user "sa" with password "secret" with full priviledges.

我在本地网络上安装了一台装有Microsoft SQL Server 2000的 Windows 机器,启动并监听本地 IP 地址 10.32.42.69。我创建了一个名为“Common”的空数据库。我有用户“sa”,密码“secret”,具有完全权限。

I am using the following python code to setup the connection:

我正在使用以下 python 代码来设置连接:

import pyodbc
odbcstring = "SERVER=10.32.42.69;UID=sa;PWD=secret;DATABASE=Common;DRIVER=FreeTDS"
con = pyodbc.connect(s)
cur = con.cursor()
cur.execute('''
CREATE TABLE testing (
    id INTEGER NOT NULL IDENTITY(1,1), 
    name NVARCHAR(200) NULL, 
    PRIMARY KEY (id)
)
    ''')
con.commit()

Everything WORKSup to this point. I have used SQLServer's Enterprise Manager on the server and the new table is there. Now I want to insert some data on the table.

一切WORKS了这一点。我在服务器上使用了 SQLServer 的企业管理器,新表就在那里。现在我想在表上插入一些数据。

cur = con.cursor()
cur.execute('INSERT INTO testing (name) VALUES (?)', (u'something',))

That fails!! Here's the error I get:

那失败了!!这是我得到的错误:

pyodbc.Error: ('HY004', '[HY004] [FreeTDS][SQL Server]Invalid data type 
(0) (SQLBindParameter)'

Since my client is configured to use UTF-8 I thought I could solve by encoding data to UTF-8. That works, but then I get back strange data:

由于我的客户端配置为使用 UTF-8,我想我可以通过将数据编码为 UTF-8 来解决。那行得通,但后来我得到了奇怪的数据:

cur = con.cursor()
cur.execute('DELETE FROM testing')
cur.execute('INSERT INTO testing (name) VALUES (?)', (u'somé string'.encode('utf-8'),))
con.commit()
# fetching data back
cur = con.cursor()
cur.execute('SELECT name FROM testing')
data = cur.fetchone()
print type(data[0]), data[0]

That gives no error, but the data returned is not the same data sent! I get:

这没有错误,但返回的数据与发送的数据不同!我得到:

<type 'unicode'> som?? string

That is, pyodbc won't accept an unicode object directly, but it returns unicode objects back to me! And the encoding is being mixed up!

也就是说,pyodbc 不会直接接受 unicode 对象,但它会将 unicode 对象返回给我!并且编码被混淆了!

Now for the question:

现在的问题:

I want code to insert unicode data in a NVARCHAR and/or NTEXT field. When I query back, I want the same data I inserted back.

我想要代码在 NVARCHAR 和/或 NTEXT 字段中插入 unicode 数据。当我查询回来时,我想要我插入的相同数据。

That can be by configuring the system differently, or by using a wrapper function able to convert the data correctly to/from unicode when inserting or retrieving

这可以通过以不同方式配置系统,或者通过使用能够在插入或检索时将数据正确转换为 unicode 的包装函数

That's not asking much, is it?

这要求不高吧?

采纳答案by Nicolas Dumazet

I can remember having this kind of stupid problems using odbc drivers, even if that time it was a java+oracle combination.

我记得在使用 odbc 驱动程序时遇到过这种愚蠢的问题,即使当时是 java+oracle 组合。

The core thing is that odbc driver apparently encodes the query string when sending it to the DB. Even if the field is Unicode, and if you provide Unicode, in some cases it does not seem to matter.

核心是 odbc 驱动程序在将查询字符串发送到数据库时显然对其进行了编码。即使该字段是 Unicode,并且如果您提供 Unicode,在某些情况下似乎也无关紧要。

You need to ensure that what is sent by the driver has the same encoding as your Database (not only server, but also database). Otherwise, of course you get funky characters because either the client or the server is mixing things up when encoding/or decoding. Do you have any idea of the charset (codepoint as MS like to say) that your server is using as a default for decoding data?

您需要确保驱动程序发送的内容与您的数据库(不仅是服务器,还有数据库)具有相同的编码。否则,当然你会得到时髦的字符,因为客户端或服务器在编码/或解码时会混淆。您是否知道您的服务器用作解码数据的默认字符集(MS 喜欢说的代码点)?

Collation has nothing to do with this problem :)

整理与此问题无关:)

See that MS pagefor example. For Unicode fields, collation is used only to define the sort order in the column, notto specify how the data is stored.

例如,参见那个 MS 页面。对于 Unicode 字段,排序规则仅用于定义列中的排序顺序,而不用于指定数据的存储方式。

If you store your data as Unicode, there is an Unique way to represent it, that's the purpose of Unicode: no need to define a charset that is compatible with all the languages that you are going to use :)

如果您将数据存储为 Unicode,则有一种独特的方式来表示它,这就是 Unicode 的目的:无需定义与您将要使用的所有语言兼容的字符集:)

The question here is "what happens when I give data to the server that is notUnicode?". For example:

这里的问题是“当我向服务器提供Unicode 的数据时会发生什么?”。例如:

  • When I send an UTF-8 string to the server, how does it understand it?
  • When I send an UTF-16 string to the server, how does it understand it?
  • When I send a Latin1 string to the server, how does it understand it?
  • 当我向服务器发送一个 UTF-8 字符串时,它是如何理解的?
  • 当我向服务器发送一个 UTF-16 字符串时,它是如何理解的?
  • 当我向服务器发送一个Latin1字符串时,它是如何理解的?

From the server perspective, all these 3 strings are only a stream of bytes. The server cannot guess the encoding in which you encoded them. Which means that you willget troubles if your odbc client ends up sending bytestrings(an encoded string) to the server instead of sending unicodedata: if you do so, the server will use a predefined encoding (that was my question: what encoding the server will use? Since it is not guessing, it must be a parameter value), and if the string had been encoded using a different encoding, dzing, data will get corrupted.

从服务器的角度来看,所有这 3 个字符串都只是一个字节流。服务器无法猜测您对它们进行编码的编码。这意味着如果您的 odbc 客户端最终向服务器发送字节串(编码字符串)而不是发送unicode数据,您遇到麻烦:如果这样做,服务器将使用预定义的编码(这是我的问题:什么编码服务器将使用?因为它不是猜测,它必须是一个参数值),如果字符串使用不同的编码dzing 进行编码,数据将被损坏。

It's exactly similar as doing in Python:

它与在 Python 中所做的完全相似:

uni = u'Hey my name is André'
in_utf8 = uni.encode('utf-8')
# send the utf-8 data to server
# send(in_utf8)

# on server side
# server receives it. But server is Japanese.
# So the server treats the data with the National charset, shift-jis:
some_string = in_utf8 # some_string = receive()    
decoded = some_string.decode('sjis')

Just try it. It's fun. The decoded string is supposed to be "Hey my name is André", but is "Hey my name is Andr??". é gets replaced by Japanese ??

去尝试一下。很有趣。解码后的字符串应该是“嘿我的名字是安德烈”,但是“嘿我的名字是安德烈??”。é 被日语取代??

Hence my suggestion: you need to ensure that pyodbc is able to send directly the data as Unicode. If pyodbc fails to do this, you will get unexpected results.

因此我的建议是:您需要确保 pyodbc 能够将数据作为 Unicode 直接发送。如果pyodbc 没有做到这一点,你会得到意想不到的结果。

And I described the problem in the Client to Server way. But the same sort of issues can arise when communicating back from the Server to the Client. If the Client cannot understand Unicode data, you'll likely get into troubles.

我以客户端到服务器的方式描述了这个问题。但是当从服务器返回到客户端时,也会出现同样的问题。如果客户端无法理解 Unicode 数据,您可能会遇到麻烦。

FreeTDS handles Unicode for you.

FreeTDS 为您处理 Unicode。

Actually, FreeTDS takes care of things for you and translates all the data to UCS2 unicode. (Source).

实际上,FreeTDS 会为您处理一切并将所有数据转换为 UCS2 Unicode。(来源)。

  • Server <--> FreeTDS : UCS2 data
  • FreeTDS <--> pyodbc : encoded strings, encoded in UTF-8 (from /etc/freetds/freetds.conf)
  • 服务器 <--> FreeTDS:UCS2 数据
  • FreeTDS <--> pyodbc :编码字符串,以 UTF-8 编码(来自/etc/freetds/freetds.conf

So I would expect your application to work correctly if you pass UTF-8 data to pyodbc. In fact, as this django-pyodbc ticketstates, django-pyodbc communicates in UTF-8 with pyodbc, so you should be fine.

因此,如果您将 UTF-8 数据传递给 pyodbc,我希望您的应用程序能够正常工作。事实上,正如此django-pyodbc 票证所述,django-pyodbc 以 UTF-8 与 pyodbc 通信,所以您应该没问题。

FreeTDS 0.82

FreeTDS 0.82

However, cramm0says that FreeTDS 0.82 is not completely bugfree, and that there are significant differences between 0.82 and the official patched 0.82 version that can be found here. You should probably try using the patched FreeTDS

但是,cramm0表示FreeTDS0.82 并非完全没有错误,并且 0.82 与官方修补的 0.82 版本之间存在显着差异,可在此处找到。您应该尝试使用打过补丁的 FreeTDS



Edited: removed old data, which had nothing to do with FreeTDS but was only relevant to Easysoft commercial odbc driver. Sorry.

编辑删除旧数据,与 FreeTDS 无关,仅与 Easysoft 商业 odbc 驱动程序相关。对不起。

回答by Paul Harrington

I use UCS-2 to interact with SQL Server, not UTF-8.

我使用 UCS-2 与 SQL Server 交互,而不是 UTF-8。

Correction: I changed the .freetds.conf entry so that the client uses UTF-8

更正:我更改了 .freetds.conf 条目,以便客户端使用 UTF-8

    tds version = 8.0
    client charset = UTF-8
    text size = 32768

Now, bind values work fine for UTF-8 encoded strings. The driver converts transparently between the UCS-2 used for storage on the dataserver side and the UTF-8 encoded strings given to/taken from the client.

现在,绑定值适用于 UTF-8 编码的字符串。驱动程序在用于数据服务器端存储的 UCS-2 和提供给/从客户端获取的 UTF-8 编码字符串之间进行透明转换。

This is with pyodbc 2.0 on Solaris 10 running Python 2.5 and FreeTDS freetds-0.82.1.dev.20081111 and SQL Server 2008

这是在 Solaris 10 上运行 Python 2.5 和 FreeTDS freetds-0.82.1.dev.20081111 和 SQL Server 2008 的 pyodbc 2.0

import pyodbc
test_string = u"""Comment ?a va ? Très bien ?"""

print type(test_string),repr(test_string)
utf8 = 'utf8:' + test_string.encode('UTF-8')
print type(utf8), repr(utf8)

c = pyodbc.connect('DSN=SA_SQL_SERVER_TEST;UID=XXX;PWD=XXX')

cur = c.cursor()
# This does not work as test_string is not UTF-encoded
try: 
    cur.execute('INSERT unicode_test(t) VALUES(?)', test_string)
    c.commit()
except pyodbc.Error,e:
    print e


# This one does:
try:
    cur.execute('INSERT unicode_test(t) VALUES(?)', utf8)
    c.commit()
except pyodbc.Error,e:
    print e    


Here is the output from the test table (I had manually put in a bunch of test data via Management Studio)

这是测试表的输出(我通过 Management Studio 手动放入了一堆测试数据)

In [41]: for i in cur.execute('SELECT t FROM unicode_test'):
   ....:     print i
   ....:
   ....:
('this is not a banana', )
('\xc3\x85kergatan 24', )
('\xc3\x85kergatan 24', )
('\xe6\xb0\xb4 this is code-point 63CF', )
('Mich\xc3\xa9l', )
('Comment a va ? Trs bien ?', )
('utf8:Comment \xc3\xa7a va ? Tr\xc3\xa8s bien ?', )

I was able to put in some in unicode code points directly into the table from Management Studio by the 'Edit Top 200 rows' dialog and entering the hex digits for the unicode code point and then pressing Alt-X

我能够通过“编辑前 200 行”对话框将一些 unicode 代码点直接从 Management Studio 放入表格中,然后输入 unicode 代码点的十六进制数字,然后按 Alt-X

回答by Roman Bataev

I had the same problem when trying to bind unicode parameter: '[HY004] [FreeTDS][SQL Server]Invalid data type (0) (SQLBindParameter)'

我在尝试绑定 unicode 参数时遇到了同样的问题:'[HY004] [FreeTDS][SQL Server]Invalid data type (0) (SQLBindParameter)'

I solved it by upgrading freetds to version 0.91.

我通过将 freetds 升级到 0.91 版解决了这个问题。

I use pyodbc 2.1.11. I had to apply thispatch to make it work with unicode, otherwise I was getting memory corruption errors occasionally.

我使用 pyodbc 2.1.11。我必须应用补丁才能使其与 unicode 一起使用,否则我偶尔会遇到内存损坏错误。

回答by Eugene Yokota

Are you sure it's INSERT that's causing problem not reading? There's a bug open on pyodbc Problem fetching NTEXT and NVARCHAR data.

您确定是 INSERT 导致无法读取问题吗?pyodbc Problem fetching NTEXT and NVARCHAR data上有一个错误打开。