pandas read_sql 异常缓慢

Question

提问by ale19

I'm trying to read several columns from three different MySQL tables into three different dataframes.

我正在尝试将来自三个不同 MySQL 表的几列读取到三个不同的数据帧中。

It doesn't take long to read from the database, but actually putting them into a dataframe is fairly slow.

从数据库中读取不需要很长时间，但实际上将它们放入数据帧中是相当慢的。

start_time = time.time()
print('Reading data from database...')

from sqlalchemy import create_engine
q_crash = 'SELECT <query string> FROM table1'
q_vehicle = 'SELECT <query string> table2'
q_person = 'SELECT <query string> FROM table3'
engine = create_engine('mysql+pymysql://user:password@host:port/dbasename')

print('Database time: {:.1f}'.format(time.time() - start_time))

crash = pd.read_sql_query(q_crash, engine)
print('Read_sql time for table 1: {:.1f}'.format(time.time() - start_time))
vehicle = pd.read_sql_query(q_vehicle, engine)
print('Read_sql time for table 2: {:.1f}'.format(time.time() - start_time))
person = pd.read_sql_query(q_person, engine)
print('Read_sql time for table 3: {:.1f}'.format(time.time() - start_time))

Output:

输出：

Reading data from database...
Database time: 0.0
Read_sql time for table 1: 13.4
Read_sql time for table 2: 30.9
Read_sql time for table 3: 49.4

Is this normal? The tables are quite large-- table 3 is over 601,000 rows. But pandas has handled larger datasets without a hitch whenever I use read_csv.

这是正常的吗？这些表非常大——表 3 超过 601,000 行。但是每当我使用 read_csv 时，pandas 都能顺利处理更大的数据集。

Answer 1

回答by MaxU

IMO it doesn't make much sense to read up complete tables to Pandas DFs if you have them in MySQL DB - why don't you use SQL for filtering and joining your data? Do you really need allrows from those three tables as Pandas DFs?

IMO 如果您在 MySQL DB 中拥有 Pandas DF，那么将完整的表读取到 Pandas DF 没有多大意义-为什么不使用 SQL 来过滤和连接您的数据？你真的需要这三个表中的所有行作为 Pandas DF 吗？

If you want to join them you could do it first on the MySQL side and load the result set into single DF...

如果你想加入他们，你可以先在 MySQL 端做，然后将结果集加载到单个 DF ...

something similar to:

类似于：

qry = 'select p.*, v.*, c.* from vehicle v join person p on v.id = p.vehicle_id join crash c on c.id = p.crash_id where <additional where clause>'
df = pd.read_sql(qry, engine)

pandas read_sql 异常缓慢

提问by ale19

回答by MaxU

相关推荐

最近更新

标签

pandas read_sql 异常缓慢

提问by ale19

回答by MaxU

相关推荐

pandas 越界纳秒时间戳

pandas PyCharm 中未显示数据帧头

Pandas：计算数据框中重复条目的平均值

pandas 如何将压缩的 (gz) CSV 文件读入 dask 数据框？

相关推荐

最近更新

标签