Python 使用 PYODBC 从 Pandas 获取数据到 SQL 服务器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25661754/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get data from pandas into a SQL server with PYODBC
提问by andy redmayne
I am trying to understand how python could pull data from an FTP server into pandas then move this into SQL server. My code here is very rudimentary to say the least and I am looking for any advice or help at all. I have tried to load the data from the FTP server first which works fine.... If I then remove this code and change it to a select from ms sql server it is fine so the connection string works, but the insertion into the SQL server seems to be causing problems.
我试图了解 python 如何将数据从 FTP 服务器提取到 Pandas 中,然后将其移动到 SQL 服务器中。我这里的代码至少可以说是非常基本的,我正在寻找任何建议或帮助。我试图首先从 FTP 服务器加载数据,这工作正常......如果我然后删除此代码并将其更改为来自 ms sql server 的选择,那么连接字符串可以正常工作,但插入到 SQL服务器似乎引起了问题。
import pyodbc
import pandas
from ftplib import FTP
from StringIO import StringIO
import csv
ftp = FTP ('ftp.xyz.com','user','pass' )
ftp.set_pasv(True)
r = StringIO()
ftp.retrbinary('filname.csv', r.write)
pandas.read_table (r.getvalue(), delimiter=',')
connStr = ('DRIVER={SQL Server Native Client 10.0};SERVER=localhost;DATABASE=TESTFEED;UID=sa;PWD=pass')
conn = pyodbc.connect(connStr)
cursor = conn.cursor()
cursor.execute("INSERT INTO dbo.tblImport(Startdt, Enddt, x,y,z,)" "VALUES (x,x,x,x,x,x,x,x,x,x.x,x)")
cursor.close()
conn.commit()
conn.close()
print"Script has successfully run!"
When I remove the ftp code this runs perfectly, but I do not understand how to make the next jump to get this into Microsoft SQL server, or even if it is possible without saving into a file first.
当我删除 ftp 代码时,它可以完美运行,但我不明白如何进行下一次跳转以将其导入 Microsoft SQL 服务器,或者即使可以不先保存到文件中也是可能的。
采纳答案by joris
For the 'write to sql server' part, you can use the convenient to_sqlmethod of pandas (so no need to iterate over the rows and do the insert manually). See the docs on interacting with SQL databases with pandas: http://pandas.pydata.org/pandas-docs/stable/io.html#io-sql
对于“写入 sql server”部分,您可以使用方便to_sql的 pandas 方法(因此无需遍历行并手动插入)。请参阅有关使用 Pandas 与 SQL 数据库交互的文档:http: //pandas.pydata.org/pandas-docs/stable/io.html#io-sql
You will need at least pandas 0.14 to have this working, and you also need sqlalchemyinstalled. An example, assuming dfis the DataFrame you got from read_table:
您至少需要 pandas 0.14 才能运行此功能,并且还需要sqlalchemy安装。一个例子,假设df是你得到的数据帧read_table:
import sqlalchemy
import pyodbc
engine = sqlalchemy.create_engine("mssql+pyodbc://<username>:<password>@<dsnname>")
# write the DataFrame to a table in the sql database
df.to_sql("table_name", engine)
See also the documentation page of to_sql.
More info on how to create the connection engine with sqlalchemy for sql server with pyobdc, you can find here:http://docs.sqlalchemy.org/en/rel_1_1/dialects/mssql.html#dialect-mssql-pyodbc-connect
另请参阅 的文档页面to_sql。
有关如何使用 pyobdc 使用 sqlalchemy 为 sql server 创建连接引擎的更多信息,您可以在此处找到:http://docs.sqlalchemy.org/en/rel_1_1/dialects/mssql.html#dialect-mssql-pyodbc-connect
But if your goal is to just get the csv data into the SQL database, you could also consider doing this directly from SQL. See eg Import CSV file into SQL Server
但如果您的目标只是将 csv 数据放入 SQL 数据库,您也可以考虑直接从 SQL 执行此操作。参见例如将CSV 文件导入 SQL Server
回答by Babu Arunachalam
I found that using bcp utility (https://docs.microsoft.com/en-us/sql/tools/bcp-utility) works best when you have a large dataset. I have 2.7 million rows that inserts at 80K rows/sec. You can store your data frame as csv file (use tabs for separator if your data doesn't have tabs and utf8 encoding). With bcp, I've used format "-c" and it works without issues so far.
我发现当您拥有大型数据集时,使用 bcp 实用程序 ( https://docs.microsoft.com/en-us/sql/tools/bcp-utility) 效果最佳。我有 270 万行以 8 万行/秒的速度插入。您可以将数据框存储为 csv 文件(如果您的数据没有制表符和 utf8 编码,请使用制表符作为分隔符)。使用 bcp,我使用了格式“-c”,到目前为止它没有问题。
回答by Random
Python3 version using a LocalDB SQL instance:
使用 LocalDB SQL 实例的 Python3 版本:
from sqlalchemy import create_engine
import urllib
import pyodbc
import pandas as pd
df = pd.read_csv("./data.csv")
quoted = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=(localDb)\ProjectsV14;DATABASE=database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
df.to_sql('TargetTable', schema='dbo', con = engine)
result = engine.execute('SELECT COUNT(*) FROM [dbo].[TargetTable]')
result.fetchall()
回答by Serhii Kushchenko
Yes, the bcputility seems to be the best solution for most cases.
是的,该bcp实用程序似乎是大多数情况下的最佳解决方案。
If you want to stay within Python, the following code should work.
如果你想留在 Python 中,下面的代码应该可以工作。
from sqlalchemy import create_engine
import urllib
import pyodbc
quoted = urllib.parse.quote_plus("DRIVER={SQL Server};SERVER=YOUR\ServerName;DATABASE=YOur_Database")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
df.to_sql('Table_Name', schema='dbo', con = engine, chunksize=200, method='multi', index=False, if_exists='replace')
Don't avoid method='multi', because it significantly reduces the task execution time.
不要避免method='multi',因为它显着减少了任务执行时间。
Sometimes you may encounter the following error.
有时您可能会遇到以下错误。
ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request. (8003) (SQLExecDirectW)')
ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]传入的请求参数太多。服务器最多支持2100个参数。减少参数数量并重新发送请求. (8003) (SQLExecDirectW)')
In such a case, determine the number of columns in your dataframe: df.shape[1]. Divide the maximum supported number of parameters by this value and use the result's floor as a chunk size.
在这种情况下,请确定数据框中的列数:df.shape[1]。将支持的最大参数数除以该值,并将结果的下限用作块大小。

