Python SQLAlchemy:flush() 和 commit() 有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4201455/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:44:27  来源:igfitidea点击:

SQLAlchemy: What's the difference between flush() and commit()?

pythonsqlalchemy

提问by AP257

What the difference is between flush()and commit()in SQLAlchemy?

SQLAlchemyflush()commit()SQLAlchemy 中的区别是什么?

I've read the docs, but am none the wiser - they seem to assume a pre-understanding that I don't have.

我已经阅读了文档,但并不明智 - 他们似乎假设了我没有的预先理​​解。

I'm particularly interested in their impact on memory usage. I'm loading some data into a database from a series of files (around 5 million rows in total) and my session is occasionally falling over - it's a large database and a machine with not much memory.

我对它们对内存使用的影响特别感兴趣。我正在将一些数据从一系列文件(总共大约 500 万行)加载到数据库中,我的会话偶尔会失败 - 这是一个大型数据库和一台内存不多的机器。

I'm wondering if I'm using too many commit()and not enough flush()calls - but without really understanding what the difference is, it's hard to tell!

我想知道我是否使用了太多commit()而不是足够的flush()电话 - 但没有真正理解有什么区别,很难说!

采纳答案by snapshoe

A Session object is basically an ongoing transaction of changes to a database (update, insert, delete). These operations aren't persisted to the database until they are committed (if your program aborts for some reason in mid-session transaction, any uncommitted changes within are lost).

Session 对象基本上是对数据库进行更改(更新、插入、删除)的持续事务。这些操作在提交之前不会持久保存到数据库中(如果您的程序在会话中事务中出于某种原因中止,则其中的任何未提交更改都将丢失)。

The session object registers transaction operations with session.add(), but doesn't yet communicate them to the database until session.flush()is called.

会话对象使用 注册事务操作session.add(),但在session.flush()被调用之前不会将它们传送到数据库。

session.flush()communicates a series of operations to the database (insert, update, delete). The database maintains them as pending operations in a transaction. The changes aren't persisted permanently to disk, or visible to other transactions until the database receives a COMMIT for the current transaction (which is what session.commit()does).

session.flush()向数据库传达一系列操作(插入、更新、删除)。数据库将它们作为事务中的挂起操作进行维护。在数据库收到当前事务的 COMMIT 之前,更改不会永久保存到磁盘,也不会对其他事务可见(这就是这样session.commit()做的)。

session.commit()commits (persists) those changes to the database.

session.commit()提交(保留)对数据库的这些更改。

flush()is alwayscalled as part of a call to commit()(1).

flush()始终称为一个呼叫的一部分commit()1)。

When you use a Session object to query the database, the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds. By default, Session objects autoflushtheir operations, but this can be disabled.

当您使用 Session 对象查询数据库时,查询将返回来自数据库和它持有的未提交事务的刷新部分的结果。默认情况下,会话对象autoflush他们的操作,但这可以被禁用。

Hopefully this example will make this clearer:

希望这个例子能让这个更清楚:

#---
s = Session()

s.add(Foo('A')) # The Foo('A') object has been added to the session.
                # It has not been committed to the database yet,
                #   but is returned as part of a query.
print 1, s.query(Foo).all()
s.commit()

#---
s2 = Session()
s2.autoflush = False

s2.add(Foo('B'))
print 2, s2.query(Foo).all() # The Foo('B') object is *not* returned
                             #   as part of this query because it hasn't
                             #   been flushed yet.
s2.flush()                   # Now, Foo('B') is in the same state as
                             #   Foo('A') was above.
print 3, s2.query(Foo).all() 
s2.rollback()                # Foo('B') has not been committed, and rolling
                             #   back the session's transaction removes it
                             #   from the session.
print 4, s2.query(Foo).all()

#---
Output:
1 [<Foo('A')>]
2 [<Foo('A')>]
3 [<Foo('A')>, <Foo('B')>]
4 [<Foo('A')>]

回答by Jacob

As @snapshoe says

正如@snapshoe 所说

flush()sends your SQL statements to the database

commit()commits the transaction.

flush()将您的 SQL 语句发送到数据库

commit()提交事务。

When session.autocommit == False:

什么时候session.autocommit == False

commit()will call flush()if you set autoflush == True.

commit()flush()如果您设置,将调用autoflush == True

When session.autocommit == True:

什么时候session.autocommit == True

You can't call commit()if you haven't started a transaction (which you probably haven't since you would probably only use this mode to avoid manually managing transactions).

commit()如果您尚未开始交易(您可能没有开始,因为您可能只会使用此模式来避免手动管理交易),则无法调用。

In this mode, you must call flush()to save your ORM changes. The flush effectively also commits your data.

在这种模式下,您必须调用flush()以保存您的 ORM 更改。刷新也有效地提交了您的数据。

回答by Jimbo

Why flush if you can commit?

如果可以提交,为什么要刷新?

As someone new to working with databases and sqlalchemy, the previous answers - that flush()sends SQL statements to the DB and commit()persists them - were not clear to me. The definitions make sense but it isn't immediately clear from the definitions why you would use a flush instead of just committing.

作为使用数据库和 sqlalchemy 的新手,之前的答案 -flush()将 SQL 语句发送到数据库并commit()保留它们 - 对我来说并不清楚。这些定义是有道理的,但从定义中并不能立即清楚为什么要使用刷新而不是仅仅提交。

Since a commit always flushes (https://docs.sqlalchemy.org/en/13/orm/session_basics.html#committing) these sound really similar. I think the big issue to highlight is that a flush is not permanent and can be undone, whereas a commit is permanent, in the sense that you can't ask the database to undo the last commit (I think)

由于提交总是刷新(https://docs.sqlalchemy.org/en/13/orm/session_basics.html#committing),这些听起来非常相似。我认为要强调的大问题是刷新不是永久性的并且可以撤消,而提交是永久性的,从某种意义上说,您不能要求数据库撤消最后一次提交(我认为)

@snapshoe highlights that if you want to query the database and get results that include newly added objects, you need to have flushed first (or committed, which will flush for you). Perhaps this is useful for some people although I'm not sure why you would want to flush rather than commit (other than the trivial answer that it can be undone).

@snapshoe 强调,如果您想查询数据库并获得包含新添加对象的结果,您需要先刷新(或提交,它将为您刷新)。也许这对某些人有用,尽管我不确定您为什么要刷新而不是提交(除了可以撤消的简单答案)。

In another example I was syncing documents between a local DB and a remote server, and if the user decided to cancel, all adds/updates/deletes should be undone (i.e. no partial sync, only a full sync). When updating a single document I've decided to simply delete the old row and add the updated version from the remote server. It turns out that due to the way sqlalchemy is written, order of operations when committing is not guaranteed. This resulted in adding a duplicate version (before attempting to delete the old one), which resulted in the DB failing a unique constraint. To get around this I used flush()so that order was maintained, but I could still undo if later the sync process failed.

在另一个例子中,我在本地数据库和远程服务器之间同步文档,如果用户决定取消,所有添加/更新/删除都应该被撤消(即没有部分同步,只有完全同步)。更新单个文档时,我决定简单地删除旧行并从远程服务器添加更新版本。事实证明,由于 sqlalchemy 的编写方式,无法保证提交时的操作顺序。这导致添加了重复版本(在尝试删除旧版本之前),从而导致数据库无法通过唯一约束。为了解决这个问题,我使用了flush()这个顺序,但如果稍后同步过程失败,我仍然可以撤消。

See my post on this at: Is there any order for add versus delete when committing in sqlalchemy

请参阅我的帖子:在 sqlalchemy 中提交时,是否有添加与删除的顺序

Similarly, someone wanted to know whether add order is maintained when committing, i.e. if I add object1then add object2, does object1get added to the database before object2Does SQLAlchemy save order when adding objects to session?

同样,有人想知道提交时是否保持添加顺序,即如果我添加object1然后添加object2,是否object1在将对象添加到会话时 SQLAlchemy 保存顺序之前添加到数据库中object2

Again, here presumably the use of a flush() would ensure the desired behavior. So in summary, one use for flush is to provide order guarantees (I think), again while still allowing yourself an "undo" option that commit does not provide.

同样,这里大概使用 flush() 将确保所需的行为。因此,总而言之,flush 的一个用途是提供顺序保证(我认为),同时仍然允许自己使用 commit 不提供的“撤消”选项。

Autoflush and Autocommit

自动刷新和自动提交

Note, autoflush can be used to ensure queries act on an updated database as sqlalchemy will flush before executing the query. https://docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autoflush

请注意,自动刷新可用于确保查询对更新的数据库起作用,因为 sqlalchemy 将在执行查询之前刷新。https://docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autoflush

Autocommit is something else that I don't completely understand but it sounds like its use is discouraged: https://docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autocommit

自动提交是我不完全理解的其他东西,但听起来不鼓励使用它:https: //docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params。自动提交

Memory Usage

内存使用情况

Now the original question actually wanted to know about the impact of flush vs. commit for memory purposes. As the ability to persist or not is something the database offers (I think), simply flushing should be sufficient to offload to the database - although committing shouldn't hurt (actually probably helps - see below) if you don't care about undoing.

现在最初的问题实际上想知道出于内存目的的刷新与提交的影响。由于持久性或不持久性是数据库提供的东西(我认为),简单的刷新应该足以卸载到数据库 - 尽管提交不应该受到伤害(实际上可能有帮助 - 见下文),如果你不关心撤消.

sqlalchemy uses weak referencing for objects that have been flushed: https://docs.sqlalchemy.org/en/13/orm/session_state_management.html#session-referencing-behavior

sqlalchemy 对已刷新的对象使用弱引用:https: //docs.sqlalchemy.org/en/13/orm/session_state_management.html#session-referencing-behavior

This means if you don't have an object explicitly held onto somewhere, like in a list or dict, sqlalchemy won't keep it in memory.

这意味着如果您没有将对象显式保存在某处,例如在列表或字典中,sqlalchemy 不会将其保存在内存中。

However, then you have the database side of things to worry about. Presumably flushing without committing comes with some memory penalty to maintain the transaction. Again, I'm new to this but here's a link that seems to suggest exactly this: https://stackoverflow.com/a/15305650/764365

但是,那么您需要担心数据库方面的事情。据推测,在不提交的情况下刷新会带来一些内存损失来维护事务。再次,我是新手,但这里有一个链接似乎暗示了这一点:https: //stackoverflow.com/a/15305650/764365

In other words, commits should reduce memory usage, although presumably there is a trade-off between memory and performance here. In other words, you probably don't want to commit every single database change, one at a time (for performance reasons), but waiting too long will increase memory usage.

换句话说,提交应该减少内存使用,尽管这里可能需要在内存和性能之间进行权衡。换句话说,您可能不想一次提交每个数据库更改(出于性能原因),但等待时间过长会增加内存使用量。

回答by Romain Vincent

This does not strictly answer the original question but some people have mentioned that with session.autoflush = Trueyou don't have to use session.flush()... And this is not always true.

这并没有严格回答最初的问题,但有些人提到session.autoflush = True你不必使用session.flush()......这并不总是正确的。

If you want to use the id of a newly created object in the middle of a transaction, you must call session.flush().

如果要在事务中间使用新创建对象的 id,则必须调用session.flush().

# Given a model with at least this id
class AModel(Base):
   id = Column(Integer, primary_key=True)  # autoincrement by default on integer primary key

session.autoflush = True

a = AModel()
session.add(a)
a.id  # None
session.flush()
a.id  # autoincremented integer

This is because autoflushdoes NOTauto fill the id (although a query of the object will, which sometimes can cause confusion as in "why this works here but not there?" But snapshoealready covered this part).

这是因为autoflush自动填写ID(尽管对象的查询将,这有时会导致混乱,如“为什么这个作品在这里,但不是吗?”但是,snapshoe已涵盖这一部分)。



One related aspect that seems pretty important to me and wasn't really mentioned:

一个对我来说似乎很重要但并未真正提及的相关方面:

Why would you not commit all the time?- The answer is atomicity.

你为什么不一直承诺?- 答案是原子性

A fancy word to say: an ensemble of operations have to allbe executed successfully OR none of them will take effect.

一个花哨的说法:一组操作必须全部成功执行,否则它们都不会生效。

For example, if you want to create/update/delete some object (A) and then create/update/delete another (B), but if (B) fails you want to revert (A). This means those 2 operations are atomic.

例如,如果您想创建/更新/删除某个对象 (A),然后创建/更新/删除另一个 (B),但如果 (B) 失败,您想恢复 (A)。这意味着这两个操作是原子的

Therefore, if (B) needs a result of (A), you want to call flushafter (A) and commitafter (B).

因此,如果 (B) 需要 (A) 的结果,您需要flush在 (A)commit之后和(B) 之后调用。

Also, if session.autoflush is True, except for the case that I mentioned above or others in Jimbo's answer, you will not need to call flushmanually.

此外,如果session.autoflush is True,除了我上面提到的情况或金回答中的其他情况外,您将不需要flush手动拨打电话。