pandas read_sql 异常缓慢

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40045093/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:12:52  来源:igfitidea点击:

pandas read_sql is unusually slow

pythonmysqlpandas

提问by ale19

I'm trying to read several columns from three different MySQL tables into three different dataframes.

我正在尝试将来自三个不同 MySQL 表的几列读取到三个不同的数据帧中。

It doesn't take long to read from the database, but actually putting them into a dataframe is fairly slow.

从数据库中读取不需要很长时间,但实际上将它们放入数据帧中是相当慢的。

start_time = time.time()
print('Reading data from database...')

from sqlalchemy import create_engine
q_crash = 'SELECT <query string> FROM table1'
q_vehicle = 'SELECT <query string> table2'
q_person = 'SELECT <query string> FROM table3'
engine = create_engine('mysql+pymysql://user:password@host:port/dbasename')

print('Database time: {:.1f}'.format(time.time() - start_time))

crash = pd.read_sql_query(q_crash, engine)
print('Read_sql time for table 1: {:.1f}'.format(time.time() - start_time))
vehicle = pd.read_sql_query(q_vehicle, engine)
print('Read_sql time for table 2: {:.1f}'.format(time.time() - start_time))
person = pd.read_sql_query(q_person, engine)
print('Read_sql time for table 3: {:.1f}'.format(time.time() - start_time))

Output:

输出:

Reading data from database...
Database time: 0.0
Read_sql time for table 1: 13.4
Read_sql time for table 2: 30.9
Read_sql time for table 3: 49.4

Is this normal? The tables are quite large-- table 3 is over 601,000 rows. But pandas has handled larger datasets without a hitch whenever I use read_csv.

这是正常的吗?这些表非常大——表 3 超过 601,000 行。但是每当我使用 read_csv 时,pandas 都能顺利处理更大的数据集。

回答by MaxU

IMO it doesn't make much sense to read up complete tables to Pandas DFs if you have them in MySQL DB - why don't you use SQL for filtering and joining your data? Do you really need allrows from those three tables as Pandas DFs?

IMO 如果您在 MySQL DB 中拥有 Pandas DF,那么将完整的表读取到 Pandas DF 没有多大意义-为什么不使用 SQL 来过滤和连接您的数据?你真的需要这三个表中的所有行作为 Pandas DF 吗?

If you want to join them you could do it first on the MySQL side and load the result set into single DF...

如果你想加入他们,你可以先在 MySQL 端做,然后将结果集加载到单个 DF ...

something similar to:

类似于:

qry = 'select p.*, v.*, c.* from vehicle v join person p on v.id = p.vehicle_id join crash c on c.id = p.crash_id where <additional where clause>'
df = pd.read_sql(qry, engine)