如何使用 sqlalchemy+pyodbc 和 MS SQL Server 中的多个数据库为 pandas read_sql 创建 sql alchemy 连接?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43613806/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:28:34  来源:igfitidea点击:

How to create sql alchemy connection for pandas read_sql with sqlalchemy+pyodbc and multiple databases in MS SQL Server?

pythonsql-serverpandassqlalchemyodbc

提问by Sergey Zakharov

I am trying to use 'pandas.read_sql_query' to copy data from MS SQL Server into a pandas DataFrame. I need to do multiple joins in my SQL query. The tables being joined are on the same server but in different databases. The query I am passing to pandas works fine inside MS SQL Server Management Studio. In a Jupyter Notebook I tried to query data like so (to make things readable the query itself is simplified to just 2 joins and generic names are used):

我正在尝试使用 'pandas.read_sql_query' 将数据从 MS SQL Server 复制到 Pandas DataFrame 中。我需要在我的 SQL 查询中进行多个连接。被联接的表位于同一台服务器上,但位于不同的数据库中。我传递给 Pandas 的查询在 MS SQL Server Management Studio 中运行良好。在 Jupyter Notebook 中,我尝试像这样查询数据(为了使事情可读,查询本身被简化为仅 2 个连接并使用通用名称):

import pandas as pd
import sqlalchemy as sql
import pyodbc

server = '100.10.10.10'
driver = 'SQL+Server+Native+Client+11.0'
myQuery = '''SELECT first.Field1, second.Field2
           FROM db1.schema.Table1 AS first
           JOIN db2.schema.Table2 AS second
           ON first.Id = second.FirstId
           '''
engine = sql.create_engine('mssql+pyodbc://{}?driver={}'.format(server, driver))
df = pd.read_sql_query(myQuery, engine)

This does not work and returns an error:

这不起作用并返回错误:

DBAPIError: (pyodbc.Error) ('IM010', '[IM010] [Microsoft][????????? ????????? ODBC] ??????? ??????? ??? ????????? ?????? (0) (SQLDriverConnect)')

It seems that the problem is in the engine which does not include information about the database, because everything works fine with the next kind of code, where I include database in the engine:

似乎问题出在引擎中,它不包含有关数据库的信息,因为在下一种代码中一切正常,我在引擎中包含数据库:

myQuery = 'select Field1 from schema.Table1'
db = 'db1'
engine = sql.create_engine('mssql+pyodbc://{}/{}?driver={}'.format(server, db, driver))
df = pd.read_sql_query(myQuery, engine)

but breaks like the code with joins above if I don't include database in the engine, but add it to the query like so:

但是如果我不在引擎中包含数据库,则会像上面带有连接的代码一样中断,但将其添加到查询中,如下所示:

myQuery = 'select Field1 from db1.schema.Table1'
engine = sql.create_engine('mssql+pyodbc://{}?driver={}'.format(server, 
driver))
df = pd.read_sql_query(myQuery, engine)

So how should I specify the pandas.read_sql_query 'sql' and 'con' parameters in this case when I need to join tables from different databases but the same server?

那么在这种情况下,当我需要连接来自不同数据库但同一服务器的表时,我应该如何指定 pandas.read_sql_query 'sql' 和 'con' 参数?

P.S. I only have read access to this server I am connecting to. I can not create new tables or views or anything like that.

PS 我对我连接的这台服务器只有读权限。我无法创建新表或视图或类似的东西。

Update: The MS SQL Server version is 2008 R2.

更新:MS SQL Server 版本是 2008 R2。

Update 2: I am using Python 3.6 and Windows 10.

更新 2:我使用 Python 3.6 和 Windows 10。

采纳答案by Sergey Zakharov

So I have found a workaround: use pymssql instead of pyodbc (both in the import statement and in the engine). It lets you build your joins using database names and without specifying them in the engine. And there is no need to specify a driver in this case.

所以我找到了一个解决方法:使用 pymssql 而不是 pyodbc(在导入语句和引擎中)。它允许您使用数据库名称构建连接,而无需在引擎中指定它们。在这种情况下不需要指定驱动程序。

There might be a problem if you are using Python 3.6 which is not supported by pymssql oficially yet, but you can find unofficial wheels for your Python 3.6 here. It works as is supposed to with my queries.

如果您使用的是 pymssql 官方尚不支持的 Python 3.6,则可能会出现问题,但您可以在此处找到 Python 3.6 的非官方轮子。它的工作原理与我的查询一样。

Here is the original code with joins, rebuilt to work with pymssql:

这是带有连接的原始代码,重新构建以与 pymssql 一起使用:

import pandas as pd
import sqlalchemy as sql
import pymssql

server = '100.10.10.10'
myQuery = '''SELECT first.Field1, second.Field2
           FROM db1.schema.Table1 AS first
           JOIN db2.schema.Table2 AS second
           ON first.Id = second.FirstId'''
engine = sql.create_engine('mssql+pymssql://{}'.format(server))
df = pd.read_sql_query(myQuery, engine)

As for the unofficial wheels, you need to download the file for Python 3.6 from the link I gave above, then cd to the download folder and run pip install wheelswhere 'wheels' is the name of the wheels file.

至于非官方的轮子,你需要从我上面给出的链接下载 Python 3.6 的文件,然后 cd 到下载文件夹并运行pip install wheels,其中 'wheels' 是轮子文件的名称。

UPDATE:

更新:

Actually, it is possible to use pyodbc too. I am not sure if this should work for any SQL Server setup, but everything worked for me after I had set 'master' as my database in the engine. The resulting code would look like this:

实际上,也可以使用 pyodbc。我不确定这是否适用于任何 SQL Server 设置,但是在我将“master”设置为引擎中的数据库后,一切都对我有用。生成的代码如下所示:

import pandas as pd
import sqlalchemy as sql
import pyodbc

server = '100.10.10.10'
driver = 'SQL+Server'
db = 'master'
myQuery = '''SELECT first.Field1, second.Field2
           FROM db1.schema.Table1 AS first
           JOIN db2.schema.Table2 AS second
           ON first.Id = second.FirstId'''
engine = sql.create_engine('mssql+pyodbc://{}/{}?driver={}'.format(server, db, driver))
df = pd.read_sql_query(myQuery, engine)