java 选择分布式共享内存解决方案

Question

提问by mindas

I have a task to build a prototype for a massively scalable distributed shared memory (DSM) app. The prototype would only serve as a proof-of-concept, but I want to spend my time most effectively by picking the components which would be used in the real solution later on.

我的任务是为可大规模扩展的分布式共享内存 (DSM) 应用程序构建原型。原型仅用作概念验证，但我想通过选择稍后将在实际解决方案中使用的组件来最有效地利用我的时间。

The aim of this solution is to take data input from an external source, churn it and make the result available for a number of frontends. Those "frontends" would just take the data from the cache and serve it without extra processing. The amount of frontend hits on this data can literally be millions per second.

该解决方案的目的是从外部源获取数据输入，搅动它并使结果可用于许多前端。这些“前端”只会从缓存中获取数据并提供它而无需额外处理。前端对这些数据的点击量可以达到每秒数百万次。

The data itself is very volatile; it can (and does) change quite rapidly. However the frontends should see "old" data until the newest has been processed and cached. The processing and writing is done by a single (redundant) node while other nodes only read the data. In other words: no read-through behaviour.

数据本身非常不稳定；它可以（并且确实）迅速改变。然而，前端应该看到“旧”数据，直到最新的数据被处理和缓存。处理和写入由单个（冗余）节点完成，而其他节点仅读取数据。换句话说：没有通读行为。

I was looking into solutions like memcachedhowever this particular one doesn't fulfil allour requirements which are listed below:

我正在研究像memcached这样的解决方案，但是这个特定的解决方案并不能满足我们下面列出的所有要求：

The solution must at least have Java client APIwhich is reasonably well maintained as the rest of app is written in Java and we are seasoned Java developers;
The solution must be totally elastic: it should be possible to add new nodes without restarting other nodes in the cluster;
The solution must be able to handle failover. Yes, I realize this means some overhead, but the overall served data size isn't big (1G max) so this shouldn't be a problem. By "failover" I mean seamless execution without hardcoding/changing server IP address(es) like in memcached clients when a node goes down;
Ideally it should be possible to specify the degree of data overlapping (e.g. how many copies of the same data should be stored in the DSM cluster);
There is no need to permanently store all the data but there might be a need of post-processing of some of the data (e.g. serialization to the DB).
Price. Obviously we prefer free/open source but we're happy to pay a reasonable amount if a solution is worth it. In any way, paid 24hr/day support contract is a must.
The whole thing has to be hosted in our data centersso SaaS offerings like Amazon SimpleDB are out of scope. We would only consider this if no other options would be available.
Ideally the solution would be strictly consistent(as in CAP); however, eventual consistencecan be considered as an option.

该解决方案必须至少具有Java 客户端 API，该API维护得相当好，因为应用程序的其余部分是用 Java 编写的，而且我们是经验丰富的 Java 开发人员；
解决方案必须完全有弹性：应该可以在不重启集群中的其他节点的情况下添加新节点；
解决方案必须能够处理故障转移。是的，我意识到这意味着一些开销，但整体服务的数据大小并不大（最大 1G）所以这应该不是问题。“故障转移”是指无缝执行，无需像在 memcached 客户端中那样在节点出现故障时硬编码/更改服务器 IP 地址；
理想情况下，应该可以指定数据重叠的程度（例如，应该在 DSM 集群中存储多少相同数据的副本）；
不需要永久存储所有数据，但可能需要对某些数据进行后处理（例如序列化到 DB）。
价格。显然，我们更喜欢免费/开源，但如果解决方案值得，我们很乐意支付合理的费用。无论如何，必须支付 24 小时/天的支持合同。
整个事情必须托管在我们的数据中心，因此像 Amazon SimpleDB 这样的 SaaS 产品超出了范围。如果没有其他选项可用，我们只会考虑这一点。
理想情况下，解决方案应该是严格一致的（如在 CAP 中）；然而，最终的一致性可以被视为一种选择。

Thanks in advance for any ideas.

提前感谢您的任何想法。

Answer 1

回答by Fuad Malikov

Have a look at Hazelcast. It is pure Java, open source (Apache license) highly scalable in-memory data grid product. It does offer 7X24 support. And it does solve all of your problems I tried to explain each of them below:

看看黑兹卡斯特。它是纯 Java、开源（Apache 许可）高度可扩展的内存数据网格产品。它确实提供 7X24 支持。它确实解决了我试图在下面解释每个问题的所有问题：

It has a native Java Client.
It is 100% dynamic. Add and remove nodes dynamically. No need to change anything.
Again everything is dynamic.
You can configure number of backup nodes.
Hazelcast support persistency.
Everything that Hazelcast offers is free(open source) and it does offer enterprise level support.
Hazelcast is single jar file. super easy to use. Just add jar to your classpath. Have a look at screen cast in main page.
Hazelcast is strictly consistent. You can never read stale data.

它有一个本地 Java 客户端。
它是 100% 动态的。动态添加和删除节点。不需要改变任何东西。
同样，一切都是动态的。
您可以配置备份节点的数量。
Hazelcast 支持持久性。
Hazelcast 提供的一切都是免费的（开源），并且确实提供企业级支持。
Hazelcast 是单个 jar 文件。超级容易使用。只需将 jar 添加到您的类路径中。看看主页中的屏幕投射。
Hazelcast 是严格一致的。您永远无法读取过时的数据。

Answer 2

回答by Nikita Koksharov

I suggest you to use Redisson- Redis based In-memory Data Grid for Java. Implements (BitSet, BloomFilter, Set, SortedSet, Map, ConcurrentMap, List, Queue, Deque, BlockingQueue, BlockingDeque, ReadWriteLock, Semaphore, Lock, AtomicLong, CountDownLatch, Publish / Subscribe, RemoteService, ExecutorService, LiveObjectService, SchedulerService) on top of Redisserver! It supports master/slave, sentinel and cluster server modes. Automatic cluster/sentinel servers topology discovery supported also. This lib is free and open-source.

我建议您使用Redisson- 基于 Redis 的 Java 内存数据网格。在Redis服务器上实现 ( BitSet, BloomFilter, Set, SortedSet, Map, ConcurrentMap, List, Queue, Deque, BlockingQueue, BlockingDeque, ReadWriteLock, Semaphore, Lock, AtomicLong, CountDownLatch, Publish / Subscribe, RemoteService, ExecutorService, LiveObjectService, SchedulerService) ！它支持主/从、哨兵和集群服务器模式。还支持自动集群/哨兵服务器拓扑发现。这个库是免费和开源的。

Perfectly works in cloud thanks to AWS Elasticache support

得益于 AWS Elasticache 支持，可在云中完美运行

Answer 3

回答by Kynao

Depending of what you prefer, i would surely follow the others by suggesting Hazelcast if you're towards AP from the CAP Theorem but if you need CP, i would choose Redis

根据您的喜好，如果您从 CAP 定理走向 AP，我肯定会通过建议 Hazelcast 来跟随其他人，但如果您需要 CP，我会选择Redis

Answer 4

回答by Herber

I am doing a similar project, but instead targeting the .NET platform. Apart from the already mentioned solutions, I think you should take a look at ScaleOut StateServerand Alachisoft NCache. I am afraid neither of these alternatives are cheap, but they are a safer bet than open source for commercial solutions according to my judgement.

我正在做一个类似的项目，但目标是 .NET 平台。除了已经提到的解决方案，我认为您应该看看ScaleOut StateServer和Alachisoft NCache。我担心这两种替代方案都不便宜，但根据我的判断，对于商业解决方案来说，它们比开源更安全。

Both provide Java client APIs, even though I have only played around with the .NET APIs.
StateServer features self-discovery of new cache nodes, and NCache has a management console where new cache nodes can be added.
Both should be able to handle failovers seamlessly.
StateServer can have 1 or 2 passive copies of the data. NCache features more caching topologies to choose between.
If you mean write-through/write-behind to a database that is available in both.
I have no idea how many cache servers you plan to use, but here are the full price specs: ScaleOut StateServer Alachisoft NCache
Both are installed and configured locally on your server and they both have GUI Management.
I am not sure exactly what strictly consistent involves, so I'll leave that for you to investigate..

两者都提供 Java 客户端 API，尽管我只玩过 .NET API。
StateServer 具有新缓存节点的自我发现功能，NCache 有一个管理控制台，可以在其中添加新的缓存节点。
两者都应该能够无缝地处理故障转移。
StateServer 可以有 1 或 2 个数据的被动副本。NCache 具有更多缓存拓扑可供选择。
如果您的意思是对两者都可用的数据库进行直写/后写。
我不知道您计划使用多少个缓存服务器，但这里是完整的价格规格： ScaleOut StateServer Alachisoft NCache
两者都在您的服务器上本地安装和配置，并且都具有 GUI 管理。
我不确定严格一致性涉及什么，所以我会把它留给你去调查..

Overall, StateServer is the best option if you want to skip configuring every little detail in the cache cluster, while NCache features very many features and caching topologies to choose from.

总的来说，如果您想跳过配置缓存集群中的每一个小细节，StateServer 是最佳选择，而 NCache 具有非常多的功能和缓存拓扑可供选择。

Depending on the behaviour of data towards the clients (if the data is read many times from the same client) it might be a good idea to mix local caching on the clients with the distributed caching in the cluster (available for both NCache and StateServer), just a thought.

根据数据对客户端的行为（如果从同一客户端多次读取数据），将客户端上的本地缓存与集群中的分布式缓存（可用于 NCache 和 StateServer）混合可能是个好主意，只是一个想法。

Answer 5

回答by Alexander Finn

You may want to checkout Java-specific solutions like Coherence: http://www.oracle.com/global/ru/products/middleware/coherence/index.html

您可能想要查看特定于 Java 的解决方案，例如 Coherence：http: //www.oracle.com/global/ru/products/middleware/coherence/index.html

However, I consider such solutions to be too complex and prefer to use solutions like memcached. Big disadvantage of memcached for your purpose is lack of record lock it seems and there is no built in way to replicate data for failover. That is why I would look into the key-value data stores. Many of them would satisfy your need completely.

但是，我认为这样的解决方案太复杂了，更喜欢使用像 memcached 这样的解决方案。为了您的目的，memcached 的一大缺点是缺乏记录锁定，并且没有内置的方法来复制数据以进行故障转移。这就是为什么我会研究键值数据存储。其中许多将完全满足您的需求。

Here is a list of key-value data stores that may help you with your task: http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-storesJust pick one that you fill comfortable with.

以下是可以帮助您完成任务的键值数据存储列表：http: //www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores只需选择一个你很满意。

Answer 6

回答by Tobias P.

Have a look at Terracotta's JVM clustering, it's OpenSource ;) It has no API while it works efficent at JVM level, when you store the value in a replicated object it is sent to all other nodes. Even locking and all those things work transparent and without adding any new code.

看看 Terracotta 的 JVM 集群，它是开源的 ;) 它没有 API，但它在 JVM 级别有效工作，当您将值存储在复制对象中时，它会发送到所有其他节点。甚至锁定和所有这些工作都是透明的，无需添加任何新代码。

Answer 7

回答by Anirudh Jayakumar

The specified use case seems to fit into Netflix's Hollow. This is a read-only replicated cache with a single producer and multiple consumers.

指定的用例似乎适合 Netflix 的Hollow。这是具有单个生产者和多个消费者的只读复制缓存。

Answer 8

回答by filippo

Have you tought about using a standard messaging solution like rabbitmq? RabbitMQ is an open source implementation of the AMQP protocol.

您是否考虑过使用诸如rabbitmq 之类的标准消息传递解决方案？RabbitMQ 是AMQP 协议的开源实现。

Your application seems more or less like a Publish/subscribe system. The Publisher node is the one that does the processing and puts messages (processed data) in a queue in the servers. Subscribers can get messages from the server in various ways. AMQP decouples the producer and the consumer of messages and is very flexible in how you can combine the two sides.

您的应用程序似乎或多或少像一个发布/订阅系统。Publisher 节点负责处理并将消息（已处理的数据）放入服务器的队列中。订阅者可以通过多种方式从服务器获取消息。AMQP 将消息的生产者和消费者解耦，并且在如何将两者结合方面非常灵活。

java 选择分布式共享内存解决方案

提问by mindas

回答by Fuad Malikov

回答by Nikita Koksharov

回答by Kynao

回答by Herber

回答by Alexander Finn

回答by Tobias P.

回答by Anirudh Jayakumar

回答by filippo

相关推荐

最近更新

标签

java 选择分布式共享内存解决方案

提问by mindas

回答by Fuad Malikov

回答by Nikita Koksharov

回答by Kynao

回答by Herber

回答by Alexander Finn

回答by Tobias P.

回答by Anirudh Jayakumar

回答by filippo

相关推荐

如何在 Java 中识别/处理文本文件换行符？

java 如何将 JSONObject 发送到 REST 服务？

java JUnit 3 是否有类似于@BeforeClass 的东西？

java 由于连接超时，无法通过 ImageIO.read(url) 获取图像

相关推荐

最近更新

标签