Python Pandas - 使用 to_sql 以块的形式写入大型数据帧

Question

提问by Krishan Gupta

I'm using Pandas' to_sqlfunction to write to MySQL, which is timing out due to large frame size (1M rows, 20 columns).

我正在使用 Pandas 的to_sql函数写入 MySQL，由于大帧大小（1M 行，20 列）而超时。

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_sql.html

Is there a more official way to chunk through the data and write rows in blocks? I've written my own code, which seems to work. I'd prefer an official solution though. Thanks!

有没有更正式的方法来分块数据并在块中写入行？我已经编写了自己的代码，这似乎有效。不过，我更喜欢官方解决方案。谢谢！

def write_to_db(engine, frame, table_name, chunk_size):

    start_index = 0
    end_index = chunk_size if chunk_size < len(frame) else len(frame)

    frame = frame.where(pd.notnull(frame), None)
    if_exists_param = 'replace'

    while start_index != end_index:
        print "Writing rows %s through %s" % (start_index, end_index)
        frame.iloc[start_index:end_index, :].to_sql(con=engine, name=table_name, if_exists=if_exists_param)
        if_exists_param = 'append'

        start_index = min(start_index + chunk_size, len(frame))
        end_index = min(end_index + chunk_size, len(frame))

engine = sqlalchemy.create_engine('mysql://...') #database details omited
write_to_db(engine, frame, 'retail_pendingcustomers', 20000)

Answer 1

回答by joris

Update: this functionality has been merged in pandas master and will be released in 0.15 (probably end of september), thanks to @artemyk! See https://github.com/pydata/pandas/pull/8062

更新：此功能已合并到 pandas master 中，并将在 0.15（可能在 9 月底）发布，感谢 @artemyk！见https://github.com/pydata/pandas/pull/8062

So starting from 0.15, you can specify the chunksizeargument and e.g. simply do:

所以从 0.15 开始，你可以指定chunksize参数，例如简单地做：

df.to_sql('table', engine, chunksize=20000)

Answer 2

回答by nes

There is beautiful idiomatic function chunks provided in answer to this question

在回答这个问题时提供了漂亮的惯用功能块

In your case you can use this function like this:

在您的情况下，您可以像这样使用此功能：

def chunks(l, n):
""" Yield successive n-sized chunks from l.
"""
    for i in xrange(0, len(l), n):
         yield l.iloc[i:i+n]

def write_to_db(engine, frame, table_name, chunk_size):
    for idx, chunk in enumerate(chunks(frame, chunk_size)):
        if idx == 0:
            if_exists_param = 'replace':
        else:
            if_exists_param = 'append'
        chunk.to_sql(con=engine, name=table_name, if_exists=if_exists_param)

Only drawback that it doesn't support slicing second index in iloc function.

唯一的缺点是它不支持在 iloc 函数中对第二个索引进行切片。

Python Pandas - 使用 to_sql 以块的形式写入大型数据帧

提问by Krishan Gupta

回答by joris

回答by nes

相关推荐

最近更新

标签

Python Pandas - 使用 to_sql 以块的形式写入大型数据帧

提问by Krishan Gupta

回答by joris

回答by nes

相关推荐

pandas 用python处理96孔板中的数据标签

pandas 我应该如何通过函数传递 matplotlib 对象；作为轴，轴或图形？

pandas 通过 id 列表过滤熊猫数据框

pandas 理解熊猫数据帧中的数学错误

相关推荐

最近更新

标签