Java 在两台服务器之间同步缓存数据的最佳方式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16585798/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-16 12:17:38  来源:igfitidea点击:

Best way to synchronize cache data between two servers

javacachingsynchronize

提问by

Want to synchronize the cache data between two servers. Both database is sharing the same database, but for better execution data i have cached the data into Hash Map at startup. Thus want to synchronize the cached data without restarting servers. (Both servers starts at same time).

想要在两台服务器之间同步缓存数据。两个数据库共享同一个数据库,但为了更好地执行数据,我在启动时将数据缓存到哈希映射中。因此希望在不重新启动服务器的情况下同步缓存的数据。(两台服务器同时启动)。

Please suggest me the best and efficient way to do.

请建议我最好和最有效的方法。

采纳答案by cmbaxter

Instead of trying to synchronize the cached data between two server instances, why not centralize the caching instead using something like memcached/couchbase or redis? Using distributed caching with something like ehcache is far more complicated and error prone IMO vs centralizing the cached data using a caching server like those mentioned.

与其尝试在两个服务器实例之间同步缓存数据,为什么不使用 memcached/couchbase 或 redis 之类的东西来集中缓存?将分布式缓存与 ehcache 之类的东西一起使用要复杂得多,而且容易出错 IMO 与使用提到的缓存服务器集中缓存数据相比。

As an addendum to my original answer, when deciding what caching approach to use (in memory, centralized), one thing to take into account is the volatility of the data that is being cached.

作为我原始答案的附录,在决定使用哪种缓存方法(在内存中,集中式)时,需要考虑的一件事是正在缓存的数据的易变性。

If the data is stored in the DB, but does not change after the servers load it, then you don't even need synchronization between the servers. Just let them each load this static data into memory from the source and then go about their merry ways doing whatever it is they do. The data won't be changing, so no need to introduce a complicated pattern for keeping the data in sync between the servers.

如果数据存储在 DB 中,但在服务器加载后没有更改,那么您甚至不需要服务器之间的同步。只需让他们每个人从源代码将这些静态数据加载到内存中,然后就可以尽情享受他们所做的一切。数据不会改变,因此无需引入复杂的模式来保持服务器之间的数据同步。

If there is indeed a level of volatility in the data (like say you are caching looked up entity data from the DB in order to save hits to the DB), then I still think centralized caching is a better approach than in-memory distributed and synchronized caching. You just need to make sure that you use an appropriate expiration on the cached data to allow natural refresh of the data from time to time. Also, you might want to just drop the cached data from the centralized store when in the update path for a particular entity and then just let it be reloaded from the cache on the next request for that data. This is IMO better than trying to do a true write-through cache where you write to the underlying store as well as the cache. The DB itself might make tweaks to the data (via defaulting unsupplied values for example), and your cached data in that case might not match what's in the DB.

如果数据中确实存在一定程度的波动性(例如说您正在缓存从数据库中查找的实体数据以将命中保存到数据库中),那么我仍然认为集中式缓存是比内存中分布式更好的方法,并且同步缓存。您只需要确保对缓存数据使用适当的过期时间,以允许不时自然刷新数据。此外,您可能只想在特定实体的更新路径中删除集中存储中的缓存数据,然后在下一次请求该数据时从缓存中重新加载它。这比尝试执行真正的直写缓存要好,在这种缓存中,您可以同时写入底层存储和缓存。数据库本身可能会对数据进行调整(例如,通过默认未提供的值),

EDIT:

编辑

A question was asked in the comments about the advantages of a centralized cache (I'm guessing against something like an in memory distributed cache). I'll provide my opinion on that, but first a standard disclaimer. Centralized caching is not a cure-all. It aims to solve specific issues related to in-jvm-memory caching. Before evaluating whether or not to switch to it, you should understand what your problems are first and see if they fit with the benefits of centralized caching. Centralized caching is an architectural change and it can come with issues/caveats of its own. Don't switch to it simple because someone says it's better than what you are doing. Make sure the reason fits the problem.

在评论中提出了一个关于集中式缓存的优点的问题(我猜测类似于内存中的分布式缓存)。我会就此发表我的意见,但首先是标准的免责声明。集中缓存不是万能的。它旨在解决与 in-jvm-memory 缓存相关的特定问题。在评估是否切换到它之前,您应该先了解您的问题是什么,看看它们是否符合集中缓存的好处。集中缓存是一种架构变化,它可能会带来自己的问题/警告。不要因为有人说它比你正在做的更好而转向简单。确保原因适合问题。

Okay, now onto my opinion for what kinds of problems centralized caching can solve vs in-jvm-memory (and possibly distributed) caching. I'm going to list two things although I'm sure there are a few more. My two big ones are: Overall Memory Footprintand Data Synchronization Issues.

好的,现在我对集中式缓存可以解决哪些类型的问题与 jvm 内存(以及可能的分布式)缓存有什么看法。我将列出两件事,尽管我确信还有更多。我的两大问题是:总体内存占用数据同步问题

Let's start with Overall Memory Footprint. Say you are doing standard entity caching to protect your relational DB from undue stress. Let's also say that you have a lot of data to cache in order to really protect your DB; say in the range of many GBs. If you are doing in-jvm-memory caching, and you say had 10 app server boxes, you would need to get that additional memory ($$$) times 10 for each of the boxes that would need to be doing the caching in jvm memory. In addition, you would then have to allocate a larger heap to your JVM in order to accommodate the cached data. I'm from the opinion that the JVM heap should be small and streamlined in order to ease garbage collection burden. If you have a large chunks of Old Gen that can't be collected then your going to stress your garbage collector when it goes into a full GC and tries to reap something back from that bloated Old Gen space. You want to avoid long GC2 pause times and bloating your Old Gen is not going to help with that. Plus, if you memory requirement is above a certain threshold, and you happened to be running 32 bit machines for your app layer, you'll have to upgrade to 64 bit machines and that can be another prohibitive cost.

让我们从整体内存占用量开始. 假设您正在执行标准实体缓存以保护您的关系数据库免受过度压力。假设您有大量数据要缓存以真正保护您的数据库;说在许多 GB 的范围内。如果您正在进行 jvm 内存缓存,并且您说有 10 个应用程序服务器框,则需要为每个需要在 jvm 中进行缓存的框获得额外的内存 ($$$) 乘以 10记忆。此外,您必须为 JVM 分配更大的堆以容纳缓存数据。我的观点是 JVM 堆应该小而精简,以减轻垃圾收集负担。如果你有大量的 Old Gen 可以' 当垃圾收集器进入完整的 GC 并试图从那个臃肿的 Old Gen 空间中收获一些东西时,你就会给它带来压力。你想避免长时间的 GC2 暂停时间,并且让你的 Old Gen 膨胀无济于事。另外,如果您的内存需求高于某个阈值,并且您的应用层碰巧运行 32 位机器,则您必须升级到 64 位机器,这可能是另一个令人望而却步的成本。

Now if you decided to centralize the cached data instead (using something like Redis or Memcached), you could significantly reduce the overall memory footprint of the cached data because you could have it on a couple of boxes instead of all of the app server boxes in the app layer. You probably want to use a clustered approach (both technologies support it) and at least two servers to give you high availability and avoid a single point of failure in your caching layer (more on that in a sec). By one having a couple of machines to support the needed memory requirement for caching, you can save some considerable $$. Also, you can tune the app boxes and the cache boxes differently now as they are serving distinct purposes. The app boxes can be tuned for high throughput and low heap and the cache boxes can be tuned for large memory. And having smaller heaps will definitely help out with overall throughput of the app layer boxes.

现在,如果您决定集中缓存数据(使用诸如 Redis 或 Memcached 之类的东西),您可以显着减少缓存数据的整体内存占用,因为您可以将它放在几个盒子上,而不是在所有应用服务器盒子上应用层。您可能希望使用集群方法(两种技术都支持它)和至少两台服务器来为您提供高可用性并避免缓存层中的单点故障(稍后会详细介绍)。通过拥有几台机器来支持缓存所需的内存要求,您可以节省一些可观的 $$。此外,您现在可以对应用程序框和缓存框进行不同的调整,因为它们服务于不同的目的。应用程序盒可以针对高吞吐量和低堆进行调整,缓存盒可以针对大内存进行调整。

Now one quick point for centralized caching in general. You should set up your application in such a way that it can survive without the cache in case it goes completely down for a period of time. In traditional entity caching, this means that when the cache goes completely unavailable, you just are hitting your DB directly for every request. Not awesome, but also not the end of the world.

现在是集中式缓存的一个快速要点。您应该以这样一种方式设置您的应用程序,使其在没有缓存的情况下可以继续存在,以防它在一段时间内完全关闭。在传统的实体缓存中,这意味着当缓存完全不可用时,您只是直接针对每个请求访问数据库。不是很棒,但也不是世界末日。

Okay, now for Data Synchronization Issues. With distributed in-jvm-memory caching, you need to keep the cache in sync. A change to cached data in one node needs to replicate to the other nodes and by sync'd into their cached data. This approach is a little scary in that if for some reason (network failure for example) one of the nodes falls out of sync, then when a request goes to that node, the data the user sees will not be accurate against what's currently in the DB. Even worse, if they make another request and that hits a different node, they will see different data and that will be confusing to the user. By centralizing the data, you eliminate this issue. Now, one could then argue that the centralized cache needs concurrency control around updates to the same cached data key. If two concurrent updates come in for the same key, how do you make sure the two updates don't stomp on each other? My thought here is to not even worry bout this; when an update happens, drop the item from the cache (and write though directly to the DB) and let it be reloaded on the next read. It's safer and easier this way. If you don't want to do that, then you can use CAS (Check-And-Set) functionality instead for optimistic concurrency control if you really want to update both the cache and db on updates.

好的,现在是数据同步问题. 使用分布式 jvm 内存缓存,您需要保持缓存同步。一个节点中缓存数据的更改需要复制到其他节点并同步到它们的缓存数据中。这种方法有点可怕,因为如果由于某种原因(例如网络故障)其中一个节点不同步,那么当请求到达该节点时,用户看到的数据与当前在D B。更糟糕的是,如果他们发出另一个请求并命中不同的节点,他们将看到不同的数据,这会让用户感到困惑。通过集中数据,您可以消除这个问题。现在,人们可能会争辩说,集中式缓存需要对同一缓存数据键的更新进行并发控制。如果同一个密钥有两个并发更新,你如何确保这两个更新不会互相影响?我的想法是不要担心这个;当发生更新时,从缓存中删除该项目(并直接写入数据库)并让它在下次读取时重新加载。这种方式更安全、更容易。如果您不想这样做,那么您可以使用 CAS(Check-And-Set)功能代替乐观并发控制,如果您真的想在更新时同时更新缓存和数据库。

So to summarize, you can save money and better tune your app layer machines if you centralize the data they cache. You also can get better accuracy of that data as you have less data synchronization issues to deal with. I hope this helps.

总而言之,如果您将它们缓存的数据集中起来,您可以节省资金并更好地调整您的应用层机器。您还可以获得更高的数据准确性,因为您需要处理的数据同步问题更少。我希望这有帮助。

回答by AlexR

First, do try to forget about the premature optimization. Do you really need the cache? 99% that you do not need it. In this case you solution is in removing the redundant code.

首先,尝试忘记过早的优化。你真的需要缓存吗?99% 你不需要它。在这种情况下,您的解决方案是删除冗余代码。

If however you need it try to stop re-inventing wheels. There are perfect ready-to use libraries. For example ehCachethat has distributed mode.

但是,如果您需要它,请尝试停止重新发明轮子。有完美的即用型库。例如具有分布式模式的ehCache

回答by Jan Vitásek

Use HazelCast. It allows data synchronization between servers using multicast protocol. It's easy to use. It supports locking and other features.

使用HazelCast。它允许使用多播协议在服务器之间进行数据同步。它很容易使用。它支持锁定和其他功能。