Python 如何强制 Django 忽略任何缓存并重新加载数据？

Question

提问by scippy

I'm using the Django database models from a process that's not called from an HTTP request. The process is supposed to poll for new data every few seconds and do some processing on it. I have a loop that sleeps for a few seconds and then gets all unhandled data from the database.

我正在使用来自不是从 HTTP 请求调用的进程的 Django 数据库模型。该过程应该每隔几秒钟轮询一次新数据并对其进行一些处理。我有一个循环，它会休眠几秒钟，然后从数据库中获取所有未处理的数据。

What I'm seeing is that after the first fetch, the process never sees any new data. I ran a few tests and it looks like Django is caching results, even though I'm building new QuerySets every time. To verify this, I did this from a Python shell:

我看到的是，在第一次获取之后，该过程永远不会看到任何新数据。我运行了一些测试，看起来 Django 正在缓存结果，即使我每次都在构建新的 QuerySets。为了验证这一点，我从 Python shell 中执行了此操作：

>>> MyModel.objects.count()
885
# (Here I added some more data from another process.)
>>> MyModel.objects.count()
885
>>> MyModel.objects.update()
0
>>> MyModel.objects.count()
1025

As you can see, adding new data doesn't change the result count. However, calling the manager's update() method seems to fix the problem.

如您所见，添加新数据不会改变结果计数。但是，调用管理器的 update() 方法似乎可以解决问题。

I can't find any documentation on that update() method and have no idea what other bad things it might do.

我找不到有关该 update() 方法的任何文档，也不知道它可能会做什么其他坏事。

My question is, why am I seeing this caching behavior, which contradicts what Django docssay? And how do I prevent it from happening?

我的问题是，为什么我会看到这种缓存行为，这与Django 文档所说的相矛盾？以及如何防止它发生？

Answer 1

回答by adamJLev

Seems like the count()goes to cache after the first time. This is the django source for QuerySet.count:

似乎count()在第一次之后进入缓存。这是 QuerySet.count 的 Django 源代码：

def count(self):
    """
    Performs a SELECT COUNT() and returns the number of records as an
    integer.

    If the QuerySet is already fully cached this simply returns the length
    of the cached results set to avoid multiple SELECT COUNT(*) calls.
    """
    if self._result_cache is not None and not self._iter:
        return len(self._result_cache)

    return self.query.get_count(using=self.db)

updatedoes seem to be doing quite a bit of extra work, besides what you need.
But I can't think of any better way to do this, short of writing your own SQL for the count.
If performance is not super important, I would just do what you're doing, calling updatebefore count.

update除了您需要的之外，似乎确实做了很多额外的工作。
但是我想不出任何更好的方法来做到这一点，除了为计数编写自己的 SQL 之外。
如果性能不是非常重要，我会做你正在做的事情，update在count.

QuerySet.update:

查询集.更新：

def update(self, **kwargs):
    """
    Updates all elements in the current QuerySet, setting all the given
    fields to the appropriate values.
    """
    assert self.query.can_filter(), \
            "Cannot update a query once a slice has been taken."
    self._for_write = True
    query = self.query.clone(sql.UpdateQuery)
    query.add_update_values(kwargs)
    if not transaction.is_managed(using=self.db):
        transaction.enter_transaction_management(using=self.db)
        forced_managed = True
    else:
        forced_managed = False
    try:
        rows = query.get_compiler(self.db).execute_sql(None)
        if forced_managed:
            transaction.commit(using=self.db)
        else:
            transaction.commit_unless_managed(using=self.db)
    finally:
        if forced_managed:
            transaction.leave_transaction_management(using=self.db)
    self._result_cache = None
    return rows
update.alters_data = True

Answer 2

回答by Travis Swicegood

You can also use MyModel.objects._clone().count().All of the methods in the the QuerySetcall _clone()prior to doing any work - that ensures that any internal caches are invalidated.

您还可以MyModel.objects._clone().count().在执行任何工作之前使用QuerySet调用中的所有方法_clone()- 确保任何内部缓存无效。

The root cause is that MyModel.objectsis the same instance each time. By cloning it you're creating a new instance without the cached value. Of course, you can always reach in and invalidate the cache if you'd prefer to use the same instance.

根本原因是MyModel.objects每次都是同一个实例。通过克隆它，您将创建一个没有缓存值的新实例。当然，如果您更喜欢使用相同的实例，您总是可以访问缓存并使缓存无效。

Answer 3

回答by hwjp

We've struggled a fair bit with forcing django to refresh the "cache" - which it turns out wasn't really a cache at all but an artifact due to transactions. This might not apply to your example, but certainly in django views, by default, there's an implicit call to a transaction, which mysql then isolates from any changes that happen from other processes ater you start.

我们在强制 django 刷新“缓存”方面遇到了一些困难——结果证明它根本不是真正的缓存，而是由于事务而造成的人工制品。这可能不适用于您的示例，但肯定在 django 视图中，默认情况下，有一个对事务的隐式调用，然后 mysql 将其与您启动后其他进程发生的任何更改隔离开来。

we used the @transaction.commit_manuallydecorator and calls to transaction.commit()just before every occasion where you need up-to-date info.

我们在每次需要最新信息的场合之前使用@transaction.commit_manually装饰器和调用transaction.commit()。

As I say, this definitely applies to views, not sure whether it would apply to django code not being run inside a view.

正如我所说，这绝对适用于视图，不确定它是否适用于不在视图中运行的 Django 代码。

detailed info here:

详细信息在这里：

http://devblog.resolversystems.com/?p=439

Answer 4

回答by Nick Craig-Wood

Having had this problem and found two definitive solutions for it I thought it worth posting another answer.

遇到这个问题并找到了两个明确的解决方案后，我认为值得发布另一个答案。

This is a problem with MySQL's default transaction mode. Django opens a transaction at the start, which means that by default you won't see changes made in the database.

这是 MySQL 的默认事务模式的问题。Django 在开始时打开一个事务，这意味着默认情况下您不会看到数据库中所做的更改。

Demonstrate like this

像这样示范

Run a django shell in terminal 1

在终端 1 中运行 django shell

>>> MyModel.objects.get(id=1).my_field
u'old'

And another in terminal 2

另一个在2号航站楼

>>> MyModel.objects.get(id=1).my_field
u'old'
>>> a = MyModel.objects.get(id=1)
>>> a.my_field = "NEW"
>>> a.save()
>>> MyModel.objects.get(id=1).my_field
u'NEW'
>>>

Back to terminal 1 to demonstrate the problem - we still read the old value from the database.

回到终端 1 来演示问题 - 我们仍然从数据库中读取旧值。

>>> MyModel.objects.get(id=1).my_field
u'old'

Now in terminal 1 demonstrate the solution

现在在终端 1 演示解决方案

>>> from django.db import transaction
>>> 
>>> @transaction.commit_manually
... def flush_transaction():
...     transaction.commit()
... 
>>> MyModel.objects.get(id=1).my_field
u'old'
>>> flush_transaction()
>>> MyModel.objects.get(id=1).my_field
u'NEW'
>>>

The new data is now read

现在读取新数据

Here is that code in an easy to paste block with docstring

这是带有文档字符串的易于粘贴的块中的代码

from django.db import transaction

@transaction.commit_manually
def flush_transaction():
    """
    Flush the current transaction so we don't read stale data

    Use in long running processes to make sure fresh data is read from
    the database.  This is a problem with MySQL and the default
    transaction mode.  You can fix it by setting
    "transaction-isolation = READ-COMMITTED" in my.cnf or by calling
    this function at the appropriate moment
    """
    transaction.commit()

The alternative solution is to change my.cnf for MySQL to change the default transaction mode

另一种解决方案是更改 my.cnf for MySQL 以更改默认事务模式

transaction-isolation = READ-COMMITTED

Note that that is a relatively new feature for Mysql and has some consequences for binary logging / slaving. You could also put this in the django connection preamble if you wanted.

请注意，这是 Mysql 的一个相对较新的功能，并且对二进制日志记录/从属有一些影响。如果你愿意，你也可以把它放在 django 连接序言中。

Update 3 years later

三年后更新

Now that Django 1.6 has turned on autocommit in MySQLthis is no longer a problem. The example above now works fine without the flush_transaction()code whether your MySQL is in REPEATABLE-READ(the default) or READ-COMMITTEDtransaction isolation mode.

现在 Django 1.6 已经在 MySQL 中开启了自动提交，这不再是一个问题。flush_transaction()无论您的 MySQL 处于REPEATABLE-READ（默认）还是READ-COMMITTED事务隔离模式，上面的示例现在都可以在没有代码的情况下正常工作。

What was happening in previous versions of Django which ran in non autocommit mode was that the first selectstatement opened a transaction. Since MySQL's default mode is REPEATABLE-READthis means that no updates to the database will be read by subsequent selectstatements - hence the need for the flush_transaction()code above which stops the transaction and starts a new one.

在非自动提交模式下运行的以前版本的 Django 中发生的事情是第一条select语句打开了一个事务。由于 MySQL 的默认模式是REPEATABLE-READ这意味着后续select语句不会读取对数据库的更新- 因此需要flush_transaction()上面的代码来停止事务并启动一个新的事务。

There are still reasons why you might want to use READ-COMMITTEDtransaction isolation though. If you were to put terminal 1 in a transaction and you wanted to see the writes from the terminal 2 you would need READ-COMMITTED.

尽管如此，您仍然有可能想要使用READ-COMMITTED事务隔离的原因。如果您要将终端 1 置于事务中，并且您想查看终端 2 的写入，则您需要READ-COMMITTED。

The flush_transaction()code now produces a deprecation warning in Django 1.6 so I recommend you remove it.

该flush_transaction()代码现在在 Django 1.6 中产生弃用警告，因此我建议您将其删除。

Answer 5

回答by Chris Clark

I'm not sure I'd recommend it...but you can just kill the cache yourself:

我不确定我会推荐它......但你可以自己杀死缓存：

>>> qs = MyModel.objects.all()
>>> qs.count()
1
>>> MyModel().save()
>>> qs.count()  # cached!
1
>>> qs._result_cache = None
>>> qs.count()
2

And here's a better technique that doesn't rely on fiddling with the innards of the QuerySet: Remember that the caching is happening within a QuerySet, but refreshing the data simply requires the underlying Queryto be re-executed. The QuerySet is really just a high-level API wrapping a Query object, plus a container (with caching!) for Query results. Thus, given a queryset, here is a general-purpose way of forcing a refresh:

这里有一个更好的技术，它不依赖于摆弄 QuerySet 的内部结构：记住缓存发生在QuerySet 中，但刷新数据只需要重新执行底层Query。QuerySet 实际上只是一个包装 Query 对象的高级 API，以及一个用于 Query 结果的容器（带缓存！）。因此，给定一个查询集，这是一种强制刷新的通用方法：

>>> MyModel().save()
>>> qs = MyModel.objects.all()
>>> qs.count()
1
>>> MyModel().save()
>>> qs.count()  # cached!
1
>>> from django.db.models import QuerySet
>>> qs = QuerySet(model=MyModel, query=qs.query)
>>> qs.count()  # refreshed!
2
>>> party_time()

Pretty easy! You can of course implement this as a helper function and use as needed.

挺容易！您当然可以将其作为辅助函数实现并根据需要使用。

Answer 6

回答by Sarah Messer

If you append .all()to a queryset, it'll force a reread from the DB. Try MyModel.objects.all().count()instead of MyModel.objects.count().

如果您附加.all()到查询集，它将强制从数据库重新读取。尝试 MyModel.objects.all().count()代替MyModel.objects.count().

Python 如何强制 Django 忽略任何缓存并重新加载数据？

提问by scippy

回答by adamJLev

回答by Travis Swicegood

回答by hwjp

回答by Nick Craig-Wood

回答by Chris Clark

回答by Sarah Messer

相关推荐

最近更新

标签

Python 如何强制 Django 忽略任何缓存并重新加载数据？

提问by scippy

回答by adamJLev

回答by Travis Swicegood

回答by hwjp

回答by Nick Craig-Wood

回答by Chris Clark

回答by Sarah Messer

相关推荐

你如何根据数据类型在python中设置条件？

Python 的 `urllib2`：当我 `urlopen` 维基百科页面时，为什么会出现错误 403？

使用 Python requests 库保存大文件

Python Numpy 矩阵到数组

相关推荐

最近更新

标签