Python 遍历数据库表中所有行的最佳方法

Question

提问by OemerA

I often write little Python scripts to iterate through all rows of a DB-table. For example sending all to all subscribers a email.

我经常编写小的 Python 脚本来遍历 DB 表的所有行。例如，向所有订阅者发送一封电子邮件。

I do it like this

我这样做

conn = MySQLdb.connect(host = hst, user = usr, passwd = pw, db = db)
cursor = conn.cursor()
subscribers = cursor.execute("SELECT * FROM tbl_subscriber;")

for subscriber in subscribers:
 ...

conn.close()

I wonder if there is a better way to do this cause it is possible that my code loads thousands of rows into the memory.

我想知道是否有更好的方法来做到这一点，因为我的代码可能会将数千行加载到内存中。

I thought about that it could be done better with LIMIT. Maybe something like that:

我想它可以做得更好LIMIT。也许是这样的：

"SELECT * FROM tbl_subscriber LIMIT %d,%d;" % (actualLimit,steps)

Whats the best way to do it? How would you do it?

最好的方法是什么？你会怎么做？

Answer 1

采纳答案by aaronasterling

unless you have BLOBs in there, thousands of rows shouldn't be a problem. Do you know that it is?

除非那里有 BLOB，否则数千行应该不是问题。你知道它是吗？

Also, why bring shame on yourself and your entire family by doing something like

另外，为什么要通过做类似的事情来给自己和整个家庭带来耻辱

"SELECT * FROM tbl_subscriber LIMIT %d,%d;" % (actualLimit,steps)

when the cursor will make the substitution for you in a manner that avoids SQL injection?

游标何时会以一种避免 SQL 注入的方式为您进行替换？

c.execute("SELECT * FROM tbl_subscriber LIMIT %i,%i;", (actualLimit,steps))

Answer 2

回答by Katalonis

First of all maybe you don't need Select * from...

首先，也许您不需要 Select * from...

maybe it's enough for you just to get some stuff like: "SELECT email from..."

也许这足以让你得到一些东西，比如：“从...中选择电子邮件”

that would decrease the amount of memory usage anyway:)

无论如何，这将减少内存使用量:)

Answer 3

回答by Bj?rn Pollex

Do you have actual memory problems? When iterating over a cursor, results are fetched one at a time (your DB-API implementation might decide to prefetch results, but then it might offer a function to set the number of prefetched results).

你有实际的记忆问题吗？迭代游标时，一次获取一个结果（您的 DB-API 实现可能决定预取结果，但它可能会提供一个函数来设置预取结果的数量）。

Answer 4

回答by Andrew

Most MySQL connectors based on libmysqlclient will buffer all the results in client memory by default for performance reasons (with the assumption you won't be reading large resultsets).

出于性能原因，大多数基于 libmysqlclient 的 MySQL 连接器默认会将所有结果缓存在客户端内存中（假设您不会读取大型结果集）。

When you do need to read a large result in MySQLdb you can use a SSCursor to avoid buffering entire large resultsets.

当您确实需要在 MySQLdb 中读取大结果时，您可以使用 SSCursor 来避免缓冲整个大结果集。

http://mysql-python.sourceforge.net/MySQLdb.html#using-and-extending

SSCursor - A "server-side" cursor. Like Cursor but uses CursorUseResultMixIn. Use only if you are dealing with potentially large result sets.

SSCursor - “服务器端”游标。类似于 Cursor 但使用 CursorUseResultMixIn。仅在您处理潜在的大型结果集时使用。

This does introduce complications that you must be careful of. If you don't read all the results from the cursor, a second query will raise an ProgrammingError:

这确实会引入您必须小心的并发症。如果您没有从游标中读取所有结果，则第二个查询将引发 ProgrammingError：

>>> import MySQLdb
>>> import MySQLdb.cursors
>>> conn = MySQLdb.connect(read_default_file='~/.my.cnf')
>>> curs = conn.cursor(MySQLdb.cursors.SSCursor)
>>> curs.execute('SELECT * FROM big_table')
18446744073709551615L
>>> curs.fetchone()
(1L, '2c57b425f0de896fcf5b2e2f28c93f66')
>>> curs.execute('SELECT NOW()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 173, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")

This means you have to always read everything from the cursor (and potentially multiple resultsets) before issuing another - MySQLdb won't do this for you.

这意味着在发布另一个之前，您必须始终从游标（以及可能的多个结果集）读取所有内容 - MySQLdb 不会为您执行此操作。

Answer 5

回答by dugres

You don't have to modify the query, you can use the fetchmanymethod of cursors. Here is how I do it :

您不必修改查询，您可以使用游标的fetchmany方法。这是我的方法：

def fetchsome(cursor, some=1000):
    fetch = cursor.fetchmany
    while True:
        rows = fetch(some)
        if not rows: break
        for row in rows:
            yield row

This way you can "SELECT * FROM tbl_subscriber;" but you will only fetch someat a time.

这样你就可以“SELECT * FROM tbl_subscriber;” 但你一次只能取一些。

Python 遍历数据库表中所有行的最佳方法

提问by OemerA

采纳答案by aaronasterling

回答by Katalonis

回答by Bj?rn Pollex

回答by Andrew

回答by dugres

相关推荐

最近更新

标签

Python 遍历数据库表中所有行的最佳方法

提问by OemerA

采纳答案by aaronasterling

回答by Katalonis

回答by Bj?rn Pollex

回答by Andrew

回答by dugres

相关推荐

Python xlrd 数据提取

Python 运行时警告：除法中遇到无效值

Python 如何将键值对添加到字典中？

按排序顺序按键遍历 Python 字典

相关推荐

最近更新

标签