Java 可以持久化到磁盘的 memcached 的替代方案
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1316852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
alternative to memcached that can persist to disk
提问by Mike W
I am currently using memcached with my java app, and overall it's working great.
我目前正在将 memcached 与我的 java 应用程序一起使用,总的来说它运行良好。
The features of memcached that are most important to me are:
对我来说最重要的 memcached 特性是:
- it's fast, since reads and writes are in-memory and don't touch the disk
- it's just a key/value store (since that's all my app needs)
- it's distributed
- it uses memory efficiently by having each object live on exactly one server
- it doesn't assume that the objects are from a database (since my objects are not database objects)
- 它很快,因为读取和写入都在内存中并且不接触磁盘
- 它只是一个键/值存储(因为这是我的应用程序的全部需求)
- 它是分布式的
- 它通过让每个对象仅存在于一台服务器上来有效地使用内存
- 它不假设对象来自数据库(因为我的对象不是数据库对象)
However, there is one thing that I'd like to do that memcached can't do. I want to periodically (perhaps once per day) save the cache contents to disk. And I want to be able to restore the cache from the saved disk image.
但是,有一件我想做而 memcached 做不到的事情。我想定期(也许每天一次)将缓存内容保存到磁盘。我希望能够从保存的磁盘映像中恢复缓存。
The disk save does not need to be very complex. If a new key/value is added while the save is taking place, I don't care if it's included in the save or not. And if an existing key/value is modified while the save is taking place, the saved value should be either the old value or the new value, but I don't care which one.
磁盘保存不需要非常复杂。如果在保存时添加了新的键/值,我不在乎它是否包含在保存中。如果在保存时修改了现有的键/值,保存的值应该是旧值或新值,但我不在乎是哪一个。
Can anyone recommend another caching solution (either free or commercial) that has all (or a significant percentage) of the memcached features that are important to me, and also allows the ability to save and restore the entire cache from disk?
任何人都可以推荐另一种缓存解决方案(免费或商业),它具有对我很重要的所有(或很大比例)的 memcached 功能,并且还允许从磁盘保存和恢复整个缓存?
采纳答案by realMarkusSchmidt
Maybe your problem is like mine: I have only a few machines for memcached, but with lots of memory. Even if one of them fails or needs to be rebooted, it seriously affects the performance of the system. According to the original memcached philosophy I should add a lot more machines with less memory for each, but that's not cost-efficient and not exactly "green IT" ;)
也许你的问题和我的一样:我只有几台用于 memcached 的机器,但有很多内存。即使其中之一出现故障或需要重新启动,也会严重影响系统的性能。根据最初的 memcached 理念,我应该为每个机器添加更多内存更少的机器,但这不符合成本效益,也不完全是“绿色 IT”;)
For our solution, we built an interface layer for the Cache system so that the providers to the underlying cache systems can be nested, like you can do with streams, and wrote a cache provider for memcached as well as our own very simple Key-Value-2-disk storage provider. Then we define a weight for cache items that represent how costly it is to rebuild an item if it cannot be retrieved from cache. The nested Disk cache is only used for items with a weight above a certain threshold, maybe around 10% of all items.
对于我们的解决方案,我们为 Cache 系统构建了一个接口层,以便底层缓存系统的提供者可以嵌套,就像您可以使用流一样,并为 memcached 以及我们自己的非常简单的 Key-Value 编写了一个缓存提供者-2-磁盘存储提供程序。然后我们为缓存项定义一个权重,该权重表示如果无法从缓存中检索项,则重建该项的成本。嵌套磁盘缓存仅用于权重高于特定阈值的项目,可能占所有项目的 10% 左右。
When storing an object in the cache, we won't lose time as saving to one or both caches is queued for asynchronous execution anyway. So writing to the disk cache doesn't need to be fast. Same for reads: First we go for memcached, and only if it's not there and it is a "costly" object, then we check the disk cache (which is by magnitudes slower than memcached, but still so much better then recalculating 30 GB of data after a single machine went down).
将对象存储在缓存中时,我们不会浪费时间,因为保存到一个或两个缓存中都会排队等待异步执行。所以写入磁盘缓存不需要很快。读取也是一样:首先我们选择 memcached,并且只有当它不存在并且它是一个“昂贵”的对象时,然后我们检查磁盘缓存(它比 memcached 慢很多,但仍然比重新计算 30 GB 好得多单机宕机后的数据)。
This way we get the best from both worlds, without replacing memcached by anything new.
通过这种方式,我们可以从两个世界中获得最好的结果,而无需用任何新的东西替换 memcached。
回答by Pascal MARTIN
I have never tried it, but what about redis?
Its homepage says (quoting) :
我从未尝试过,但是redis呢?
它的主页说(引用):
Redis is a key-value database. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements.
In order to be very fast but at the same time persistent the whole dataset is taken in memory and from time to time and/or when a number of changes to the dataset are performed it is written asynchronously on disk. You may lost the last few queries that is acceptable in many applications but it is as fast as an in memory DB (Redis supports non-blocking master-slave replication in order to solve this problem by redundancy).
Redis 是一个键值数据库。它类似于 memcached,但数据集不是易失性的,值可以是字符串,就像在 memcached 中一样,但也可以使用原子操作来列出和设置推送/弹出元素。
为了非常快但同时持久,整个数据集被保存在内存中,并且不时和/或当对数据集执行一些更改时,它会异步写入磁盘。您可能会丢失在许多应用程序中可以接受的最后几个查询,但它与内存数据库一样快(Redis 支持非阻塞主从复制以通过冗余解决此问题)。
It seems to answer some points you talked about, so maybe it might be helpful, in your case?
它似乎回答了您谈到的一些要点,所以对于您的情况,它可能会有所帮助?
If you try it, I'm pretty interested in what you find out, btw ;-)
如果你尝试一下,我对你发现的东西很感兴趣,顺便说一句;-)
As a side note : if you need to write all this to disk, maybe a cachesystem is not really what you need... after all, if you are using memcached as a cache, you should be able to re-populate it on-demand, whenever it is necessary -- still, I admit, there might be some performance problems if you whole memcached cluster falls at once...
附带说明:如果您需要将所有这些写入磁盘,也许缓存系统并不是您真正需要的......毕竟,如果您使用 memcached 作为缓存,您应该能够重新填充它-需求,只要有必要——不过,我承认,如果整个 memcached 集群同时崩溃,可能会出现一些性能问题......
So, maybe some "more" key/value store oriented software could help? Something like CouchDB, for instance?
It will probably not be as fast as memcached, as data is not store in RAM, but on disk, though...
那么,也许一些“更多”面向键/值存储的软件会有所帮助?例如,像CouchDB这样的东西?
它可能不会像 memcached 那样快,因为数据不是存储在 RAM 中,而是存储在磁盘上,尽管......
回答by Bill Karwin
Have you looked at BerkeleyDB?
你看过BerkeleyDB吗?
- Fast, embedded, in-process data management.
- Key/value store, non-relational.
- Persistent storage.
- Free, open-source.
- 快速、嵌入式、进程内数据管理。
- 键/值存储,非关系。
- 持久存储。
- 免费、开源。
However, it fails to meet one of your criteria:
但是,它不符合您的标准之一:
- BDB supports distributed replication, but the data is not partitioned. Each node stores the full data set.
- BDB 支持分布式复制,但数据不分区。每个节点存储完整的数据集。
回答by Mads Hansen
Take a look at the Apache Java Caching System (JCS)
JCS is a distributed caching system written in java. It is intended to speed up applications by providing a means to manage cached data of various dynamic natures. Like any caching system, JCS is most useful for high read, low put applications. Latency times drop sharply and bottlenecks move away from the database in an effectively cached system. Learn how to start using JCS.
The JCS goes beyond simply caching objects in memory. It provides numerous additional features:
* Memory management * Disk overflow (and defragmentation) * Thread pool controls * Element grouping * Minimal dependencies * Quick nested categorical removal * Data expiration (idle time and max life) * Extensible framework * Fully configurable runtime parameters * Region data separation and configuration * Fine grained element configuration options * Remote synchronization * Remote store recovery * Non-blocking "zombie" (balking facade) pattern * Lateral distribution of elements via HTTP, TCP, or UDP * UDP Discovery of other caches * Element event handling * Remote server chaining (or clustering) and failover * Custom event logging hooks * Custom event queue injection * Custom object serializer injection * Key pattern matching retrieval * Network efficient multi-key retrieval
JCS 是一个用 java 编写的分布式缓存系统。它旨在通过提供一种管理各种动态性质的缓存数据的方法来加速应用程序。与任何缓存系统一样,JCS 对于高读取、低放置的应用程序最有用。延迟时间急剧下降,瓶颈在有效缓存的系统中远离数据库。了解如何开始使用 JCS。
JCS 不仅仅是在内存中缓存对象。它提供了许多附加功能:
* Memory management * Disk overflow (and defragmentation) * Thread pool controls * Element grouping * Minimal dependencies * Quick nested categorical removal * Data expiration (idle time and max life) * Extensible framework * Fully configurable runtime parameters * Region data separation and configuration * Fine grained element configuration options * Remote synchronization * Remote store recovery * Non-blocking "zombie" (balking facade) pattern * Lateral distribution of elements via HTTP, TCP, or UDP * UDP Discovery of other caches * Element event handling * Remote server chaining (or clustering) and failover * Custom event logging hooks * Custom event queue injection * Custom object serializer injection * Key pattern matching retrieval * Network efficient multi-key retrieval
回答by serg
We are using OSCache. I think it meets almost all your needs except periodically saving cache to the disk, but you should be able to create 2 cache managers (one memory based and one hdd based) and periodically run java cronjob that goes through all in-memory cache key/value pairs and puts them into hdd cache. What's nice about OSCache is that it is very easy to use.
我们正在使用OSCache。我认为除了定期将缓存保存到磁盘之外,它几乎可以满足您的所有需求,但是您应该能够创建 2 个缓存管理器(一个基于内存,一个基于硬盘)并定期运行 java cronjob 来遍历所有内存缓存键/值对并将它们放入硬盘缓存中。OSCache 的优点在于它非常易于使用。
回答by Artyom Sokolov
What about Terracotta?
什么兵马俑?
回答by gnirpaz
You can use GigaSpaces XAPwhich is a mature commercial product which answers your requirements and more. It is the fastest distributed in-memory data grid (cache++), it is fully distributed, and supports multiple styles of persistence methods.
您可以使用GigaSpaces XAP,它是一款成熟的商业产品,可以满足您的要求等。它是最快的分布式内存数据网格(cache++),它是完全分布式的,并且支持多种风格的持久化方法。
Guy Nirpaz, GigaSpaces
盖伊·尼尔帕兹,GigaSpaces
回答by skaffman
EhCachehas a "disk persistent" mode which dumps the cache contents to disk on shutdown, and will reinstate the data when started back up again. As for your other requirements, when running in distributed mode it replicates the data across all nodes, rather than storing them on just one. other than that, it should fit your needs nicely. It's also still under active development, which many other java caching frameworks are not.
EhCache有一个“磁盘持久化”模式,它在关机时将缓存内容转储到磁盘,并在再次启动备份时恢复数据。至于您的其他要求,当以分布式模式运行时,它会跨所有节点复制数据,而不是仅将它们存储在一个节点上。除此之外,它应该很好地满足您的需求。它也仍在积极开发中,许多其他 Java 缓存框架都没有。
回答by Tit Petric
In my experience, it is best to write an intermediate layer between the application and the backend storage. This way you can pair up memcached instances and for example sharedanced (basically same key-value store, but disk based). Most basic way to do this is, always read from memcached and fail-back to sharedanced and always write to sharedanced and memcached.
根据我的经验,最好在应用程序和后端存储之间写一个中间层。通过这种方式,您可以配对 memcached 实例,例如共享(基本上相同的键值存储,但基于磁盘)。执行此操作的最基本方法是,始终从 memcached 读取并故障回复到 sharedanced,并始终写入 sharedanced 和 memcached。
You can scale writes by sharding between multiple sharedance instances. You can scale reads N-fold by using a solution like repcached (replicated memcached).
您可以通过在多个共享实例之间进行分片来扩展写入。您可以使用诸如 repcached(复制的 memcached)之类的解决方案来扩展读取 N 倍。
If this is not trivial for you, you can still use sharedanced as a basic replacement for memcached. It is fast, most of the filesystem calls are eventually cached - using memcached in combination with sharedance only avoids reading from sharedanced until some data expires in memcache. A restart of the memcached servers would cause all clients to read from the sharedance instance atleast once - not really a problem, unless you have extremely high concurrency for the same keys and clients contend for the same key.
如果这对您来说不是小事,您仍然可以使用 sharedanced 作为 memcached 的基本替代品。它很快,大多数文件系统调用最终都会被缓存 - 将 memcached 与 sharedance 结合使用只会避免从 sharedanced 读取,直到 memcache 中的某些数据过期。重新启动 memcached 服务器将导致所有客户端至少从共享实例中读取一次 - 这不是真正的问题,除非您对相同的密钥具有极高的并发性并且客户端争用相同的密钥。
There are certain issues if you are dealing with a severely high traffic environment, one is the choice of filesystem (reiserfs performs 5-10x better than ext3 because of some internal caching of the fs tree), it does not have udp support (TCP keepalive is quite an overhead if you use sharedance only, memcached has udp thanks to the facebook team) and scaling is usually done on your aplication (by sharding data across multiple instances of sharedance servers).
如果您正在处理高流量环境,则存在某些问题,一个是文件系统的选择(由于 fs 树的某些内部缓存,reiserfs 的性能比 ext3 好 5-10 倍),它没有 udp 支持(TCP keepalive如果您只使用共享,这是一个相当大的开销,由于 Facebook 团队,memcached 有 udp)并且扩展通常在您的应用程序上完成(通过在共享服务器的多个实例之间分片数据)。
If you can leverage these factors, then this might be a good solution for you. In our current setup, a single sharedanced/memcache server can scale up to about 10 million pageviews a day, but this is aplication dependant. We don't use caching for everything (like facebook), so results may vary when it comes to your aplication.
如果您可以利用这些因素,那么这对您来说可能是一个很好的解决方案。在我们当前的设置中,单个共享/内存缓存服务器可以扩展到每天大约 1000 万次浏览量,但这取决于应用程序。我们不会对所有内容都使用缓存(例如 facebook),因此当涉及到您的应用程序时,结果可能会有所不同。
And now, a good 2 years later, Membase is a great product for this. Or Redis, if you need additional functionality like Hashes, Lists, etc.
现在,2 年后,Membase 是一个很好的产品。或者 Redis,如果你需要额外的功能,比如哈希、列表等。