java 对于快速持久缓存，是否有任何众所周知的解决方案？

Question

提问by tomasb

I need really fast and persistent cache for my web crawler. It doesnt need to be as fast as ConcurrentSkipListSet in Java, but definitely it cannot be MySQL with hash-index based table, which i tried. After 1m+ of records it takes like 80% of processor time.

Does any one know or heard of something useful for this case?
Thanks for any hint.

我的网络爬虫需要非常快速和持久的缓存。它不需要像 Java 中的 ConcurrentSkipListSet 一样快，但绝对不能是带有基于哈希索引的表的 MySQL，我试过。超过 100 万条记录后，它需要 80% 的处理器时间。

有没有人知道或听说过对这种情况有用的东西？
感谢您的任何提示。

Answer 1

采纳答案by skaffman

Try EhCache. It's a primarily in-memory cache with options for overflow and persistence to disk backing store. Been around for years, still actively developed, and very mature.

试试EhCache。它是一个主要的内存缓存，具有溢出和持久性到磁盘后备存储的选项。已经存在多年，仍在积极开发中，并且非常成熟。

Answer 2

回答by Ehcache User

I'm an employee at Terracotta (not an engineer), but I figure adding some clarity regardless of my skill set would benefit those who've consulted this posting for answers.

我是 Terracotta 的一名员工（不是工程师），但我认为无论我的技能如何，增加一些清晰度都会使那些查阅此帖子以寻求答案的人受益。

Yes, Ehcache is a well-used option when it comes to caching, over 500,000 deployments internationally and is commonly used in small clusters w/ a distributed cache. If you're application is Java based, Terracotta will arguably offer the largest performance increases with "BigData" because it gives applications in memory speeds w/ off heap advantages.

是的，在缓存方面，Ehcache 是一个很好用的选项，国际上有超过 500,000 次部署，并且通常用于具有分布式缓存的小型集群中。如果您的应用程序是基于 Java 的，Terracotta 可以说是通过“BigData”提供最大的性能提升，因为它为应用程序提供了内存速度和非堆优势。

Yes, BigMemory Go is for free. Its a 32gb freemium offering, not to be confused with open source. It cannot be used in a distributed cache, that option is with BigMemory Max and the gb limit much less.
BigMemory is persistent to disk. The Terracotta Server Array (L2) communicates w/ disk to ensure data isn't lost even in catastrophic power failures. Terracotta has acid-like properties, with 99.999% data durability. *This concept of the Terracotta Server Array usually causes a lot of confusion, refer to http://terracotta.org/documentation/terracotta-server-array/server-arraysfor more information.
BigMemory is an off heap data store, free from Garbage Collection entirely. This is done via byte code buffers and this data store is actively managed by Automatic Resource Control. Depending on your requirements you decides (i.e. how many objects you want in cache, whether you want immediate or eventual throughput, time to live of objects, etc) the Automatic Resource Control will do this work for you. This means no GC, heap sizes limited by your server's available memory, and most importantly, no tuning required.
Knowing how large of a cache you need is a guess and check method, each application is unique and thus we cannot estimate confidently how much data you need to place into memory. I'd be suspicious of any vendor who tells you one needs to place "n" GB into cache to reach SLAs of xyz...

是的，BigMemory Go 是免费的。它是一个 32GB 的免费增值产品，不要与开源混淆。它不能在分布式缓存中使用，该选项适用于 BigMemory Max，而 gb 限制要少得多。
BigMemory 对磁盘是持久的。Terracotta 服务器阵列 (L2) 与磁盘进行通信，以确保即使在发生灾难性电源故障时数据也不会丢失。兵马俑具有类似酸的特性，具有 99.999% 的数据耐久性。*Terracotta Server Array 的这个概念通常会引起很多混淆，请参阅http://terracotta.org/documentation/terracotta-server-array/server-arrays了解更多信息。
BigMemory 是一个堆外数据存储，完全不受垃圾收集的影响。这是通过字节码缓冲区完成的，并且该数据存储由自动资源控制主动管理。根据您决定的需求（即您想要缓存中的对象数量、您是想要立即还是最终吞吐量、对象的生存时间等），自动资源控制将为您完成这项工作。这意味着没有 GC，堆大小受服务器可用内存的限制，最重要的是，不需要调整。
知道您需要多大的缓存是一种猜测和检查方法，每个应用程序都是独一无二的，因此我们无法自信地估计您需要将多少数据放入内存。我会怀疑任何告诉您需要将“n”GB 放入缓存才能达到 xyz 的 SLA 的供应商...

My apologies in advance if I broke a code of ethics by posting on here or there was any implied bias. Hopefully this info was able to add some clarity and shed some light on common questions about Terracotta.

如果我在这里发帖违反了道德准则或有任何隐含的偏见，我提前道歉。希望这些信息能够增加一些清晰度并阐明有关兵马俑的常见问题。

Answer 3

回答by cruftex

I am working on cache2k, and researching recent cache eviction policies to make it the fastest java cache around, see cache2k benchmarks.

我正在研究cache2k，并研究最近的缓存驱逐策略以使其成为最快的 Java 缓存，请参阅cache2k benchmarks。

Persistence is added right now and will be available for preview and testing in two weeks. I expect it to be very stable in five weeks. The cache2k implementation is, of course, not as mature as EHCache, however, everything released, is used in within our own applications and proves itself in production environments.

持久性现已添加，两周后可用于预览和测试。我预计它会在五周内非常稳定。cache2k 的实现当然不如 EHCache 成熟，但是，发布的所有内容，都在我们自己的应用程序中使用，并在生产环境中证明了自己。

Update: The "two weeks" was very optimistic, since the whole locking concept needed finally a rewrite and careful inspection... You can track the persistence support currently emerging on github

更新：“两周”很乐观，因为整个锁定概念最终需要重写和仔细检查......你可以跟踪目前在github上出现的持久性支持

java 对于快速持久缓存，是否有任何众所周知的解决方案？

提问by tomasb

采纳答案by skaffman

回答by Ehcache User

回答by cruftex

相关推荐

最近更新

标签

java 对于快速持久缓存，是否有任何众所周知的解决方案？

提问by tomasb

采纳答案by skaffman

回答by Ehcache User

回答by cruftex

相关推荐

java 将 JPanel 添加到 JList？

使用 Java 进行树可视化

java 收到 HANDSHAKE_FAILURE 警报

java 测试 getJSONArray 是否为 null

相关推荐

最近更新

标签