pandas.DataFrame.to_sql 的进度条

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39494056/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:00:44  来源:igfitidea点击:

Progress bar for pandas.DataFrame.to_sql

pythonsqlitepandasdataframetqdm

提问by Andrei

I want to migrate data from a large csv file to sqlite3 database.

我想将数据从大型 csv 文件迁移到 sqlite3 数据库。

My code on Python 3.5 using pandas:

我在 Python 3.5 上使用 Pandas 的代码:

con = sqlite3.connect(DB_FILENAME)
df = pd.read_csv(MLS_FULLPATH)
df.to_sql(con=con, name="MLS", if_exists="replace", index=False)

Is it possible to print current status (progress bar) of execution of to_sql method?

是否可以打印 to_sql 方法执行的当前状态(进度条)?

I looked the article about tqdm, but didn't find how to do this.

我查看了关于tqdm的文章,但没有找到如何执行此操作。

回答by miraculixx

Unfortuantely DataFrame.to_sqldoes not provide a chunk-by-chunk callback, which is needed by tqdm to update its status. However, you can process the dataframe chunk by chunk:

不幸的DataFrame.to_sql是,没有提供逐块回调,这是 tqdm 更新其状态所需要的。但是,您可以逐块处理数据帧:

import sqlite3
import pandas as pd
from tqdm import tqdm

DB_FILENAME='/tmp/test.sqlite'

def chunker(seq, size):
    # from http://stackoverflow.com/a/434328
    return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))

def insert_with_progress(df, dbfile):
    con = sqlite3.connect(dbfile)
    chunksize = int(len(df) / 10) # 10%
    with tqdm(total=len(df)) as pbar:
        for i, cdf in enumerate(chunker(df, chunksize)):
            replace = "replace" if i == 0 else "append"
            cdf.to_sql(con=con, name="MLS", if_exists=replace, index=False)
            pbar.update(chunksize)

df = pd.DataFrame({'a': range(0,100000)})
insert_with_progress(df, DB_FILENAME)

Note I'm generating the DataFrame inline here for the sake of having a complete workable example without dependency.

注意我在这里生成内联 DataFrame 是为了有一个没有依赖的完整可行的例子。

The result is quite stunning:

结果相当惊人:

enter image description here

在此处输入图片说明

回答by Chris V

I wanted to share a variant of the solution posted by miraculixx - that I had to alter for SQLAlchemy:

我想分享 miraculixx 发布的解决方案的一个变体 - 我必须为 SQLAlchemy 进行更改:

#these need to be customized - myDataFrame, myDBEngine, myDBTable

df=myDataFrame

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

def insert_with_progress(df):
    con = myDBEngine.connect()
    chunksize = int(len(df) / 10)
    with tqdm(total=len(df)) as pbar:
        for i, cdf in enumerate(chunker(df, chunksize)):
            replace = "replace" if i == 0 else "append"
            cdf.to_sql(name="myDBTable", con=conn, if_exists="append", index=False) 
            pbar.update(chunksize)
            tqdm._instances.clear()

insert_with_progress(df)

回答by tok

User miraculixx has a nice example above, thank you for that. But if you want to use it with files of all sizes you should add something like this:

用户 miraculixx 上面有一个很好的例子,谢谢你。但是如果你想将它用于各种大小的文件,你应该添加如下内容:

chunksize = int(len(df) / 10)
if chunksize == 0:
    df.to_sql(con=con, name="MLS", if_exists="replace", index=False)
else:
    with tqdm(total=len(df)) as pbar:
    ...