Cassandra Java 驱动程序:多少个接触点是合理的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26852413/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 10:45:31  来源:igfitidea点击:

Cassandra Java driver: how many contact points is reasonable?

javacassandracassandra-2.0

提问by henry

In Java I connect to Cussandra cluster as this:

在 Java 中,我连接到 Cussandra 集群,如下所示:

Cluster cluster = Cluster.builder().addContactPoints("host-001","host-002").build();

Do I need to specify all hosts of the cluster in there? What If I have a cluster of 1000 nodes? Do I randomly choose few? How many, and do I really do that randomly?

我需要在那里指定集群的所有主机吗?如果我有一个 1000 个节点的集群怎么办?我随机选择几个?有多少,我真的是随机做的吗?

回答by Alex Popescu

I would say that configuring your client to use the same list of nodes as the list of seed nodes you configured Cassandra to use will give you the best results.

我想说的是,将您的客户端配置为使用与您配置 Cassandra 使用的种子节点列表相同的节点列表会给您带来最佳结果。

As you know Cassandra nodes use the seed nodes to find each other and discover the topology of the ring. The driver will use only one of the nodes provided in the list to establish the control connection, the one used to discover the cluster topology, but providing the client with the seed nodes will increase the chance for the client to continue to operate in case of node failures.

如您所知,Cassandra 节点使用种子节点来寻找彼此并发现环的拓扑结构。驱动程序将仅使用列表中提供的节点之一来建立控制连接,该节点用于发现集群拓扑,但为客户端提供种子节点将增加客户端在以下情况下继续运行的机会节点故障。

回答by Carlo Bertuccini

My approach is to add as many nodes as I can -- The reason is simple: seeds are necessary only for cluster boot but once the cluster is up and running seeds are just common nodes -- using only seeds may result in the impossibility to connect in a working cluster -- So I give myself the best chances to connect to the cluster keeping a more than reasonable amount of nodes -- it's enough one working node to get the current cluster configuration.

我的方法是尽可能多地添加节点——原因很简单:种子仅在集群启动时才需要,但一旦集群启动并运行种子就只是普通节点——仅使用种子可能会导致无法连接在一个工作集群中——所以我给自己最好的机会连接到集群,保持超过合理数量的节点——一个工作节点足以获得当前的集群配置。

回答by qualebs

Documentation from DataStax

来自 DataStax 的文档

public Cluster.Builder addContactPoint(String address)

Adds a contact point.

Contact points are addresses of Cassandra nodes that the driver uses to discover the cluster topology. Only one contact point is required (the driver will retrieve the address of the other nodes automatically), but it is usually a good idea to provide more than one contact point, because if that single contact point is unavailable, the driver cannot initialize itself correctly.

Note that by default (that is, unless you use the withLoadBalancingPolicy(com.datastax.driver.core.policies.LoadBalancingPolicy)) method of this builder), the first successfully contacted host will be use to define the local data-center for the client. If follows that if you are running Cassandra in a multiple data-center setting, it is a good idea to only provided contact points that are in the same datacenter than the client, or to provide manually the load balancing policy that suits your need.

添加接触点。

联系点是驱动程序用来发现集群拓扑的 Cassandra 节点的地址。只需要一个接触点(驱动程序会自动检索其他节点的地址),但提供多个接触点通常是个好主意,因为如果单个接触点不可用,驱动程序将无法正确初始化自己.

请注意,默认情况下(即,除非您使用withLoadBalancingPolicy(com.datastax.driver.core.policies.LoadBalancingPolicy)此构建器的) 方法),第一个成功联系的主机将用于为客户端定义本地数据中心。如果您在多个数据中心设置中运行 Cassandra,最好仅提供与客户端位于同一数据中心的联系点,或者手动提供适合您需要的负载平衡策略。

Parameters:
    address - the address of the node to connect to
Returns:
    this Builder.
Throws:
    IllegalArgumentException - if no IP address for address could be found
    SecurityException - if a security manager is present and permission to resolve the host name is denied.

From what I understand, you should just add a single contact point and the driver will discover the rest. Hope that helps. I personally use hector you should look into that too.

据我了解,您只需添加一个联系点,司机就会发现其余的。希望有帮助。我个人使用 hector 你也应该研究一下。

回答by Alexis Wilke

I read an interesting article about Netflix and their Cassandra installation.

我读了一篇关于 Netflix 及其Cassandra 安装的有趣文章。

They mention the fact that they used their Gorillasystem to take down 33% of their Cassandra cluster and see that their systems were still working as expected.

他们提到他们使用Gorilla系统关闭了 33% 的 Cassandra 集群,并看到他们的系统仍在按预期工作。

They have some 2,000 Cassandra nodes and took 33% down. This means, 1 out of 3 nodes are gone. (About 660 nodes for Netflix)

他们有大约 2,000 个 Cassandra 节点,并降低了 33%。这意味着,3 个节点中有 1 个已经消失。(Netflix 约 660 个节点)

If you are really unlucky, all the connections you specified are part of the 660 nodes... Ouch.

如果真的不走运,你指定的所有连接都是660节点的一部分……哎哟。

Chances are, though, that if you use just enough nodes and never expect a dramatic event to where more than 33% of your network goes down, then you should be able to use a pretty small number, such as 6 nodes because with such a number, you should always hit at least 4 that are up...

但是,如果您使用足够多的节点并且从不期望发生超过 33% 的网络宕机的戏剧性事件,那么您应该能够使用非常小的数量,例如 6 个节点,因为这样的数字,你应该总是至少击中 4 个……

Now, it should certainly be chosen strategically if possible. That is, if you choose 6 nodes all in the same rack when you have 6 different racks, you probably chose wrong. Instead, you probably want to specify 1 node per rack. (That's once you grow that much, of course.)

现在,如果可能的话,当然应该从战略上选择它。也就是说,当你有 6 个不同的机架时,如果你在同一个机架中选择 6 个节点,那么你可能选错了。相反,您可能希望为每个机架指定 1 个节点。(当然,那是一旦你成长了那么多。)

Note that if you have a Replication Factor of 5 and 33% of your Cassandra nodes go down, you're in trouble anyway. In that situation, many nodes cannot access the database in a QUORUM manner. Notice that Netflix talks about that. Their replication factor is just 3! (i.e. 1/3 = 0.33, and 1/5 = 0.2so 20% which is less than 33%.)

请注意,如果您的复制因子为 5,并且 33% 的 Cassandra 节点出现故障,那么无论如何您都会遇到麻烦。在这种情况下,许多节点无法以 QUORUM 方式访问数据库。请注意,Netflix 谈到了这一点。它们的复制因子仅为 3!(即1/3 = 0.331/5 = 0.2小于 33% 的 20%。)

Finally, I do not know the Java driver, I use the C++ one. When it fails, I am told. So what I can do is try with another set of IPs if necessary, until it works... My system has one connection that stays up between client accesses, so this is a one time process and I can relay the fact that this server is connected to Cassandra and thus can accept client connections. If you reconnect to Cassandra each time a client sends you a request, it may be wise to not send many IPs at all.

最后,我不知道Java驱动程序,我使用C++一个。当它失败时,我被告知。所以我可以做的是在必要时尝试使用另一组 IP,直到它工作为止......我的系统有一个在客户端访问之间保持连接的连接,所以这是一个一次性过程,我可以中继这个服务器是连接到 Cassandra,因此可以接受客户端连接。如果每次客户端向您发送请求时都重新连接到 Cassandra,则根本不发送许多 IP 可能是明智之举。