将 IBM_DB 与 Pandas 结合使用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33804410/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using IBM_DB with Pandas
提问by Shaurya Chaudhuri
I am trying to use the data analysis tool Pandas in Python Language. I am trying to read data from a IBM DB, using ibm_dbpackage. According to the documentation in Pandas website we need to provide at least 2 arguments, one would be the sql that would be executed and other would be the connection object of the database. But when i do that, it gives me error that the connection object does not have a cursor() method in it. I figured maybe this is not how this particular DB Package worked. I tried to find a few workarounds but was not successfull.
我正在尝试使用 Python 语言中的数据分析工具 Pandas。我正在尝试使用ibm_db包从 IBM DB 读取数据。根据 Pandas 网站的文档,我们需要提供至少 2 个参数,一个是将要执行的 sql,另一个是数据库的连接对象。但是当我这样做时,它给了我一个错误,即连接对象中没有 cursor() 方法。我想这可能不是这个特定的数据库包的工作方式。我试图找到一些解决方法,但没有成功。
Code:
代码:
print "hello PyDev"
con = db.connect("DATABASE=db;HOSTNAME=localhost;PORT=50000;PROTOCOL=TCPIP;UID=admin;PWD=admin;", "", "")
sql = "select * from Maximo.PLUSPCUSTOMER"
stmt = db.exec_immediate(con,sql)
pd.read_sql(sql, db)
print "done here"
Error:
错误:
hello PyDev
Traceback (most recent call last):
File "C:\Users\ray\workspace\Firstproject\pack\test.py", line 15, in <module>
pd.read_sql(sql, con)
File "D:\etl\lib\site-packages\pandas\io\sql.py", line 478, in read_sql
chunksize=chunksize)
File "D:\etl\lib\site-packages\pandas\io\sql.py", line 1504, in read_query
cursor = self.execute(*args)
File "D:\etl\lib\site-packages\pandas\io\sql.py", line 1467, in execute
cur = self.con.cursor()
AttributeError: 'ibm_db.IBM_DBConnection' object has no attribute 'cursor'
I am able to fetch data if i fetch it from the database but i need to read into a dataframe and need to write back to the database after processing data.
如果我从数据库中获取数据,我就可以获取数据,但我需要读入数据帧并需要在处理数据后写回数据库。
Code for fetching from DB
从数据库获取的代码
stmt = db.exec_immediate(con,sql)
tpl=db.fetch_tuple(stmt)
while tpl:
print(tpl)
tpl=db.fetch_tuple(stmt)
回答by Shaurya Chaudhuri
On doing further studying the package, i found that I need to wrap the IBM_DB connection object in a ibm_db_dbi connection object, which is part of the same package.
在进一步研究该包时,我发现我需要将 IBM_DB 连接对象包装在 ibm_db_dbi 连接对象中,该对象是同一个包的一部分。
So
所以
conn = ibm_db_dbi.Connection(con)
df = pd.read_sql(sql, conn)
The above code works and pandas fetches data into dataframe successfully.
上面的代码有效,pandas 成功地将数据提取到数据帧中。
回答by Torsten Steinbach
you can also check out https://pypi.python.org/pypi/ibmdbpy
您还可以查看https://pypi.python.org/pypi/ibmdbpy
It provides Pandas style API without pulling out all data into Python memory.
它提供了 Pandas 风格的 API,无需将所有数据提取到 Python 内存中。
Documentation is here: http://pythonhosted.org/ibmdbpy/index.htmlHere is a quick demo how to use it in Bluemix Notebooks: https://www.youtube.com/watch?v=tk9T1yPkn4c
文档在这里:http://pythonhosted.org/ibmdbpy/index.html 这里是一个如何在 Bluemix Notebooks 中使用它的快速演示:https: //www.youtube.com/watch?v=tk9T1yPkn4c