Python 如何使用索引将 Pandas 数据框写入 sqlite
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14431646/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to write Pandas dataframe to sqlite with Index
提问by jmatthewhouse
I have a list of stockmarket data pulled from Yahoo in a pandas DataFrame (see format below). The date is serving as the index in the DataFrame. I want to write the data (including the index) out to a SQLite database.
我在 Pandas DataFrame 中有一个从 Yahoo 提取的股票市场数据列表(见下面的格式)。日期用作 DataFrame 中的索引。我想将数据(包括索引)写入 SQLite 数据库。
AAPL GE
Date
2009-01-02 89.95 14.76
2009-01-05 93.75 14.38
2009-01-06 92.20 14.58
2009-01-07 90.21 13.93
2009-01-08 91.88 13.95
Based on my reading of the write_frame code for Pandas, it does not currently support writing the index. I've attempted to use to_records instead, but ran into the issue with Numpy 1.6.2 and datetimes. Now I'm trying to write tuples using .itertuples, but SQLite throws an error that the data type isn't supported (see code and result below). I'm relatively new to Python, Pandas and Numpy, so it is entirely possible I'm missing something obvious. I think I'm running into a problem trying to write a datetime to SQLite, but I think I might be overcomplicating this.
根据我对 Pandas 的 write_frame 代码的阅读,它目前不支持写入 index。我尝试使用 to_records 代替,但遇到了 Numpy 1.6.2 和 datetimes的问题。现在我正在尝试使用 .itertuples 编写元组,但是 SQLite 抛出一个错误,指出不支持该数据类型(请参阅下面的代码和结果)。我对 Python、Pandas 和 Numpy 比较陌生,所以我完全有可能遗漏了一些明显的东西。我想我在尝试将日期时间写入 SQLite 时遇到了问题,但我认为我可能会过于复杂。
I think I maybe able to fix the issue by upgrading to Numpy 1.7 or the development version of Pandas, which has a fix posted on GitHub. I'd prefer to develop using release versions of software - I'm new to this and I don't want stability issues confusing matters further.
我想我可以通过升级到 Numpy 1.7 或 Pandas 的开发版本来解决这个问题,它在 GitHub 上发布了一个修复程序。我更喜欢使用软件的发布版本进行开发 - 我是新手,我不希望稳定性问题进一步混淆问题。
Is there a way to accomplish this using Python 2.7.2, Pandas 0.10.0, and Numpy 1.6.2? Perhaps cleaning the datetimes somehow? I'm in a bit over my head, any help would be appreciated.
有没有办法使用 Python 2.7.2、Pandas 0.10.0 和 Numpy 1.6.2 来实现这一点?也许以某种方式清理日期时间?我有点不知所措,任何帮助将不胜感激。
Code:
代码:
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
import sqlite3 as db
# download data from yahoo
all_data = {}
for ticker in ['AAPL', 'GE']:
all_data[ticker] = pd.io.data.get_data_yahoo(ticker, '1/1/2009','12/31/2012')
# create a data frame
price = DataFrame({tic: data['Adj Close'] for tic, data in all_data.iteritems()})
# get output ready for database export
output = price.itertuples()
data = tuple(output)
# connect to a test DB with one three-column table titled "Demo"
con = db.connect('c:/Python27/test.db')
wildcards = ','.join(['?'] * 3)
insert_sql = 'INSERT INTO Demo VALUES (%s)' % wildcards
con.executemany(insert_sql, data)
Result:
结果:
---------------------------------------------------------------------------
InterfaceError Traceback (most recent call last)
<ipython-input-15-680cc9889c56> in <module>()
----> 1 con.executemany(insert_sql, data)
InterfaceError: Error binding parameter 0 - probably unsupported type.
采纳答案by Andy Hayden
In recent pandas the index will be saved in the database (you used to have to reset_indexfirst).
在最近的熊猫中,索引将保存在数据库中(您过去必须reset_index先保存)。
Following the docs(setting a SQLite connection in memory):
按照文档(在内存中设置 SQLite 连接):
import sqlite3
# Create your connection.
cnx = sqlite3.connect(':memory:')
Note: You can also pass a SQLAlchemy engine here (see end of answer).
注意:您还可以在此处传递 SQLAlchemy 引擎(请参阅答案结尾)。
We can save price2to cnx:
我们可以保存price2到cnx:
price2.to_sql(name='price2', con=cnx)
We can retrieve via read_sql:
我们可以通过read_sql以下方式检索:
p2 = pd.read_sql('select * from price2', cnx)
However, when stored (and retrieved) dates are unicoderather than Timestamp. To convert back to what we started with we can use pd.to_datetime:
但是,当存储(和检索)日期时,unicode而不是Timestamp. 要转换回我们开始的内容,我们可以使用pd.to_datetime:
p2.Date = pd.to_datetime(p2.Date)
p = p2.set_index('Date')
We get back the same DataFrame as prices:
我们取回与以下相同的 DataFrame prices:
In [11]: p2
Out[11]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1006 entries, 2009-01-02 00:00:00 to 2012-12-31 00:00:00
Data columns:
AAPL 1006 non-null values
GE 1006 non-null values
dtypes: float64(2)
You can also use a SQLAlchemy engine:
您还可以使用SQLAlchemy 引擎:
from sqlalchemy import create_engine
e = create_engine('sqlite://') # pass your db url
price2.to_sql(name='price2', con=cnx)
This allows you to use read_sql_table(which can only be used with SQLAlchemy):
这允许您使用read_sql_table(只能与 SQLAlchemy 一起使用):
pd.read_sql_table(table_name='price2', con=e)
# Date AAPL GE
# 0 2009-01-02 89.95 14.76
# 1 2009-01-05 93.75 14.38
# 2 2009-01-06 92.20 14.58
# 3 2009-01-07 90.21 13.93
# 4 2009-01-08 91.88 13.95
回答by Wes
Unfortunately, pandas.io.write_frameno longer exists in more recent versions of Pandas in regards to the current accepted answer. For example I'm using pandas 0.19.2. You can do something like
不幸的是,pandas.io.write_frame关于当前接受的答案,在更新版本的 Pandas 中不再存在。例如,我使用的是熊猫 0.19.2。你可以做类似的事情
from sqlalchemy import create_engine
disk_engine = create_engine('sqlite:///my_lite_store.db')
price.to_sql('stock_price', disk_engine, if_exists='append')
And then in turn preview your table with the following:
然后依次使用以下内容预览您的表格:
df = pd.read_sql_query('SELECT * FROM stock_price LIMIT 3',disk_engine)
df.head()
回答by Keerthesh Kumar
Below is the code which worked for me. I was able to write it to SQLite DB.
以下是对我有用的代码。我能够将它写入 SQLite DB。
import pandas as pd
import sqlite3 as sq
data = <This is going to be your pandas dataframe>
sql_data = 'D:\SA.sqlite' #- Creates DB names SQLite
conn = sq.connect(sql_data)
cur = conn.cursor()
cur.execute('''DROP TABLE IF EXISTS SA''')
data.to_sql('SA', conn, if_exists='replace', index=False) # - writes the pd.df to SQLIte DB
pd.read_sql('select * from SentimentAnalysis', conn)
conn.commit()
conn.close()

