python python线程安全对象缓存

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/213455/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 19:39:08  来源:igfitidea点击:

python threadsafe object cache

pythonmultithreadingcaching

提问by NeoAnderson

I have implemented a python webserver. Each http request spawns a new thread. I have a requirement of caching objects in memory and since its a webserver, I want the cache to be thread safe. Is there a standard implementatin of a thread safe object cache in python? I found the following

我已经实现了一个 python 网络服务器。每个 http 请求都会产生一个新线程。我需要在内存中缓存对象,因为它是一个网络服务器,我希望缓存是线程安全的。python中是否有线程安全对象缓存的标准实现?我发现了以下内容

http://freshmeat.net/projects/lrucache/

http://freshmeat.net/projects/lrucache/

This does not look to be thread safe. Can anybody point me to a good implementation of thread safe cache in python?

这看起来不是线程安全的。任何人都可以指出我在 python 中线程安全缓存的良好实现吗?

Thanks!

谢谢!

采纳答案by John Montgomery

Well a lot of operations in Python are thread-safe by default, so a standard dictionary should be ok (at least in certain respects). This is mostly due to the GIL, which will help avoid some of the more serious threading issues.

默认情况下,Python 中的许多操作都是线程安全的,因此标准字典应该没问题(至少在某些方面)。这主要是由于 GIL,这将有助于避免一些更严重的线程问题。

There's a list here: http://coreygoldberg.blogspot.com/2008/09/python-thread-synchronization-and.htmlthat might be useful.

这里有一个列表:http: //coreygoldberg.blogspot.com/2008/09/python-thread-synchronization-and.html可能有用。

Though atomic nature of those operation just means that you won't have an entirely inconsistent state if you have two threads accessing a dictionary at the same time. So you wouldn't have a corrupted value. However you would (as with most multi-threading programming) not be able to rely on the specific order of those atomic operations.

尽管这些操作的原子性意味着如果您有两个线程同时访问字典,则不会出现完全不一致的状态。所以你不会有一个损坏的值。但是,您(与大多数多线程编程一样)无法依赖这些原子操作的特定顺序。

So to cut a long story short...

所以长话短说......

If you have fairly simple requirements and aren't to bothered about the ordering of what get written into the cache then you can use a dictionary and know that you'll always get a consistent/not-corrupted value (it just might be out of date).

如果您有相当简单的要求并且不关心写入缓存的内容的顺序,那么您可以使用字典并知道您将始终获得一致/未损坏的值(它可能超出日期)。

If you want to ensure that things are a bit more consistent with regard to reading and writing then you might want to look at Django's local memory cache:

如果您想确保在读取和写入方面更加一致,那么您可能需要查看 Django 的本地内存缓存:

http://code.djangoproject.com/browser/django/trunk/django/core/cache/backends/locmem.py

http://code.djangoproject.com/browser/django/trunk/django/core/cache/backends/locmem.py

Which uses a read/write lock for locking.

它使用读/写锁进行锁定。

回答by Sam Corder

Thread per request is often a bad idea. If your server experiences huge spikes in load it will take the box to its knees. Consider using a thread pool that can grow to a limited size during peak usage and shrink to a smaller size when load is light.

每个请求的线程通常是一个坏主意。如果您的服务器在负载中遇到巨大的峰值,它将使盒子瘫痪。考虑使用一个线程池,它可以在使用高峰期增长到一个有限的大小,而在负载较轻时缩小到一个更小的大小。

回答by Parand

You probably want to use memcached instead. It's very fast, very stable, very popular, has good python libraries, and will allow you to grow to a distributed cache should you need to:

您可能想改用 memcached。它非常快,非常稳定,非常流行,有很好的 python 库,如果你需要,它可以让你成长为分布式缓存:

http://www.danga.com/memcached/

http://www.danga.com/memcached/

回答by Enrique Pérez Arnaud

For a thread safe object you want threading.local:

对于你想要 threading.local 的线程安全对象:

from threading import local

safe = local()

safe.cache = {}

You can then put and retrieve objects in safe.cachewith thread safety.

然后,您可以在safe.cache线程安全的情况下放入和检索对象。

回答by user7610

Point 1.GIL does not help you here, an example of a (non-thread-safe) cache for something called "stubs" would be

第 1 点。GIL 在这里对您没有帮助,用于称为“存根”的东西的(非线程安全)缓存示例是

stubs = {}

def maybe_new_stub(host):
    """ returns stub from cache and populates the stubs cache if new is created """
    if host not in stubs:
        stub = create_new_stub_for_host(host)
        stubs[host] = stub
    return stubs[host]

What can happen is that Thread 1 calls maybe_new_stub('localhost'), and it discovers we do not have that key in the cache yet. Now we switch to Thread 2, which calls the same maybe_new_stub('localhost'), and it also learns the key is not present. Consequently, both threads call create_new_stub_for_hostand put it into the cache.

可能发生的情况是线程 1 调用maybe_new_stub('localhost'),它发现我们在缓存中还没有那个键。现在我们切换到线程 2,它调用相同的maybe_new_stub('localhost'),并且它也学习了密钥不存在。因此,两个线程都会调用create_new_stub_for_host并将其放入缓存中。

The map itself is protected by the GIL, so we cannot break it by concurrent access. The logic of the cache, however, is not protected, and so we may end up creating two or more stubs, and dropping all except one on the floor.

映射本身受 GIL 保护,因此我们不能通过并发访问来破坏它。然而,缓存的逻辑不受保护,因此我们最终可能会创建两个或更多存根,并将除一个之外的所有存根丢弃在地板上。

Point 2.Depending on the nature of the program, you may not want a global cache. Such shared cache forces synchronization between all your threads. For performance reasons, it is good to make the threads as independent as possible. I believe I do need it, you may actually not.

第 2 点。根据程序的性质,您可能不需要全局缓存。这种共享缓存强制所有线程之间进行同步。出于性能原因,最好使线程尽可能独立。我相信我确实需要它,但实际上您可能不需要。

Point 3.You may use a simple lock. I took inspiration from https://codereview.stackexchange.com/questions/160277/implementing-a-thread-safe-lrucacheand came up with the following, which I believe is safe to use for my purposes

第 3 点。您可以使用简单的锁。我从https://codereview.stackexchange.com/questions/160277/implementing-a-thread-safe-lrucache 中获得灵感,并提出了以下内容,我认为可以安全地用于我的目的

import threading

stubs = {}
lock = threading.Lock()


def maybe_new_stub(host):
    """ returns stub from cache and populates the stubs cache if new is created """
    with lock:
        if host not in stubs:
            channel = grpc.insecure_channel('%s:6666' % host)
            stub = cli_pb2_grpc.BrkStub(channel)
            stubs[host] = stub
        return stubs[host]

Point 4.It would be best to use existing library. I haven't found any I am prepared to vouch for yet.

第 4 点。最好使用现有的库。我还没有找到任何我准备担保的东西。