Python3 - 有没有办法在非常大的 SQlite 表上逐行迭代而不将整个表加载到本地内存中？

Question

提问by Marlon Dyck

I have a very large table with 250,000+ rows, many containing a large text block in one of the columns. Right now it's 2.7GB and expected to grow at least tenfold. I need to perform python specific operations on every row of the table, but only need to access one row at a time.

我有一个非常大的表格，其中有 250,000 多行，其中许多行在其中一列中包含一个大文本块。现在它是 2.7GB，预计至少会增长十倍。我需要对表的每一行执行 python 特定的操作，但一次只需要访问一行。

Right now my code looks something like this:

现在我的代码看起来像这样：

c.execute('SELECT * FROM big_table') 
table = c.fetchall()
for row in table:
    do_stuff_with_row

This worked fine when the table was smaller, but the table is now larger than my available ram and python hangs when I try and run it. Is there a better (more ram efficient) way to iterate row by row over the entire table?

当表较小时，这工作正常，但是当我尝试运行它时，该表现在比我可用的 ram 大，并且 python 挂起。有没有更好的（更高效的）方法来逐行迭代整个表？

Answer 1

采纳答案by Martijn Pieters

cursor.fetchall()fetches all results into a list first.

cursor.fetchall()首先将所有结果提取到列表中。

Instead, you can iterate over the cursor itself:

相反，您可以遍历游标本身：

c.execute('SELECT * FROM big_table') 
for row in c:
    # do_stuff_with_row

This produces rows as needed, rather than load them all first.

这会根据需要生成行，而不是首先加载它们。

Python3 - 有没有办法在非常大的 SQlite 表上逐行迭代而不将整个表加载到本地内存中？

提问by Marlon Dyck

采纳答案by Martijn Pieters

相关推荐

最近更新

标签

Python3 - 有没有办法在非常大的 SQlite 表上逐行迭代而不将整个表加载到本地内存中？

提问by Marlon Dyck

采纳答案by Martijn Pieters

相关推荐

Python 关闭 Matplotlib 图形

Python SQLAlchemy ORM 转换为 Pandas DataFrame

tkinter python 入口高度

Python 如何检查 Pandas DataFrame 中是否有任何值是 NaN

相关推荐

最近更新

标签