Java 您如何以编程方式为多播发现机制配置hazelcast?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20385973/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you programmatically configure hazelcast for the multicast discovery mechanism?
提问by DaveFar
How do you programmatically configure hazelcast for the multicast discovery mechanism?
您如何以编程方式为多播发现机制配置hazelcast?
Details:
细节:
The documentationonly supplies an example for TCP/IP and is out-of-date: it uses Config.setPort(), which no longer exists.
该文档仅提供了 TCP/IP 示例并且已经过时:它使用不再存在的 Config.setPort()。
My configuration looks like this, but discovery does not work (i.e. I get the output "Members: 1"
:
我的配置看起来像这样,但发现不起作用(即我得到输出"Members: 1"
:
Config cfg = new Config();
NetworkConfig network = cfg.getNetworkConfig();
network.setPort(PORT_NUMBER);
JoinConfig join = network.getJoin();
join.getTcpIpConfig().setEnabled(false);
join.getAwsConfig().setEnabled(false);
join.getMulticastConfig().setEnabled(true);
join.getMulticastConfig().setMulticastGroup(MULTICAST_ADDRESS);
join.getMulticastConfig().setMulticastPort(PORT_NUMBER);
join.getMulticastConfig().setMulticastTimeoutSeconds(200);
HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg);
System.out.println("Members: "+hazelInst.getCluster().getMembers().size());
Update 1, taking asimarslan's answer into account
更新 1,考虑到 asimarslan 的回答
If I fumbled with the MulticastTimeout, I either get "Members: 1"
or
如果我对 MulticastTimeout 感到困惑,我要么得到"Members: 1"
要么
Dec 05, 2013 8:50:42 PM com.hazelcast.nio.ReadHandler WARNING: [192.168.0.9]:4446 [dev] hz._hzInstance_1_dev.IO.thread-in-0 Closing socket to endpoint Address[192.168.0.7]:4446, Cause:java.io.EOFException: Remote socket closed! Dec 05, 2013 8:57:24 PM com.hazelcast.instance.Node SEVERE: [192.168.0.9]:4446 [dev] Could not join cluster, shutting down! com.hazelcast.core.HazelcastException: Failed to join in 300 seconds!
2013 年 12 月 5 日下午 8:50:42 com.hazelcast.nio.ReadHandler 警告:[192.168.0.9]:4446 [dev] hz._hzInstance_1_dev.IO.thread-in-0 关闭套接字到端点地址 [192.168.0.7] :4446,原因:java.io.EOFException:远程套接字关闭!2013 年 12 月 5 日晚上 8:57:24 com.hazelcast.instance.Node 严重:[192.168.0.9]:4446 [dev] 无法加入集群,正在关闭!com.hazelcast.core.HazelcastException:无法在 300 秒内加入!
Update 2, taking pveentjer's answer about using tcp/ip into account
更新 2,考虑到 pveentjer 关于使用 tcp/ip 的回答
If I change the configuration to the following, I still only get 1 member:
如果我将配置更改为以下内容,我仍然只能获得 1 个成员:
Config cfg = new Config();
NetworkConfig network = cfg.getNetworkConfig();
network.setPort(PORT_NUMBER);
JoinConfig join = network.getJoin();
join.getMulticastConfig().setEnabled(false);
join.getTcpIpConfig().addMember("192.168.0.1").addMember("192.168.0.2").
addMember("192.168.0.3").addMember("192.168.0.4").
addMember("192.168.0.5").addMember("192.168.0.6").
addMember("192.168.0.7").addMember("192.168.0.8").
addMember("192.168.0.9").addMember("192.168.0.10").
addMember("192.168.0.11").setRequiredMember(null).setEnabled(true);
//this sets the allowed connections to the cluster? necessary for multicast, too?
network.getInterfaces().setEnabled(true).addInterface("192.168.0.*");
HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg);
System.out.println("debug: joined via "+join+" with "+hazelInst.getCluster()
.getMembers().size()+" members.");
More precisely, this run produces the output
更准确地说,这次运行产生了输出
debug: joined via JoinConfig{multicastConfig=MulticastConfig [enabled=false, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=true, connectionTimeoutSeconds=5, members=[192.168.0.1, 192.168.0.2, 192.168.0.3, 192.168.0.4, 192.168.0.5, 192.168.0.6, 192.168.0.7, 192.168.0.8, 192.168.0.9, 192.168.0.10, 192.168.0.11], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} with 1 members.
调试:通过 JoinConfig{multicastConfig=MulticastConfig [enabled=false,multicastGroup=224.2.2.3,multicastPort=54327,multicastTimeToLive=32,multicastTimeoutSeconds=2,trustedInterfaces=[]],tcpIpConfig=TcpIpConfig[enabled=true, connectionTimeoutSeconds=5,成员= [192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4,192.168.0.5,192.168.0.6,192.168.0.7,192.168.0.8,192.168.0.9,192.168.0.10,192.168.0.11],requiredMember = null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds =5}} 有 1 个成员。
My non-hazelcast-implementation is using UDP multicasts and works fine. So can a firewall really be the problem?
我的非 Hazelcast 实现使用 UDP 多播并且工作正常。那么防火墙真的是问题所在吗?
Update 3, taking pveentjer's answer about checking the network into account
更新 3,考虑到 pveentjer 关于检查网络的回答
Since I do not have permissions for iptables or to install iperf, I am using com.hazelcast.examples.TestApp
to check whether my network is working, as described in Getting Started With Hazelcastin Chapter 2, Section "Showing Off Straight Away":
由于我没有 iptables 或安装 iperf 的权限,我正在使用com.hazelcast.examples.TestApp
检查我的网络是否正常工作,如第 2 章“直接炫耀”部分的Hazelcast 入门中所述:
I call java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp
on 192.168.0.1 and get the output
我调用java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp
192.168.0.1 并得到输出
...Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.0.1]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 10, 2013 11:31:22 PM com.hazelcast.system
INFO: [192.168.0.1]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.1]:5701
Dec 10, 2013 11:31:22 PM com.hazelcast.system
INFO: [192.168.0.1]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 10, 2013 11:31:22 PM com.hazelcast.instance.Node
INFO: [192.168.0.1]:5701 [dev] Creating MulticastJoiner
Dec 10, 2013 11:31:22 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTING
Dec 10, 2013 11:31:24 PM com.hazelcast.cluster.MulticastJoiner
INFO: [192.168.0.1]:5701 [dev]
Members [1] {
Member [192.168.0.1]:5701 this
}
Dec 10, 2013 11:31:24 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTED
I then call java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp
on 192.168.0.2 and get the output
然后我调用java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp
192.168.0.2 并得到输出
...Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.0.2]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 10, 2013 9:50:23 PM com.hazelcast.system
INFO: [192.168.0.2]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.2]:5701
Dec 10, 2013 9:50:23 PM com.hazelcast.system
INFO: [192.168.0.2]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 10, 2013 9:50:23 PM com.hazelcast.instance.Node
INFO: [192.168.0.2]:5701 [dev] Creating MulticastJoiner
Dec 10, 2013 9:50:23 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTING
Dec 10, 2013 9:50:23 PM com.hazelcast.nio.SocketConnector
INFO: [192.168.0.2]:5701 [dev] Connecting to /192.168.0.1:5701, timeout: 0, bind-any: true
Dec 10, 2013 9:50:23 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.0.2]:5701 [dev] 38476 accepted socket connection from /192.168.0.1:5701
Dec 10, 2013 9:50:28 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.0.2]:5701 [dev]
Members [2] {
Member [192.168.0.1]:5701
Member [192.168.0.2]:5701 this
}
Dec 10, 2013 9:50:30 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTED
So multicast discovery is generally working on my cluster, right? Is 5701 also the port for discovery? Is 38476
in the last output an ID or a port?
所以多播发现通常在我的集群上工作,对吗?5701 也是发现端口吗?是38476
在最后输出的ID或端口?
Joining still does not work for my own code with programmatical configuration :(
加入仍然不适用于我自己的编程配置代码:(
Update 4, taking pveentjer's answer about using the default configuration into account
更新 4,考虑到 pveentjer 关于使用默认配置的回答
The modified TestApp gives the output
修改后的 TestApp 给出了输出
joinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3,
multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2,
trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false,
connectionTimeoutSeconds=5, members=[], requiredMember=null],
awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null',
tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}}
and does detect other members after a couple of seconds (after each instance once lists only itself as a member if all are started at the same time), whereas
并在几秒钟后检测到其他成员(如果所有实例同时启动,则在每个实例一次仅将自己列为成员后),而
myProgram gives the output
myProgram 给出输出
joined via JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multica\
stTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSecond\
s=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='nu\
ll', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} with 1 members.
and does not detect members within its runtime of about 1 minute (I am counting the members about every 5 seconds).
并且在大约 1 分钟的运行时间内没有检测到成员(我大约每 5 秒计算一次成员)。
BUT if at least one instance of TestApp runs concurrently on the cluster, all TestApp instances and all myProgram instances are detected and my program works fine. In case I start TestApp once and then myProgram twice in parallel, TestApp gives the following output:
但是如果至少有一个 TestApp 实例在集群上同时运行,那么所有 TestApp 实例和所有 myProgram 实例都会被检测到,我的程序运行良好。如果我启动 TestApp 一次,然后并行启动两次 myProgram,TestApp 会提供以下输出:
java -cp ~/CaseStudy/jtorx-1.10.0-beta8/lib/hazelcast-3.1.2.jar:. TestApp
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.180.240]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 12, 2013 12:02:15 PM com.hazelcast.system
INFO: [192.168.180.240]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.180.240]:5701
Dec 12, 2013 12:02:15 PM com.hazelcast.system
INFO: [192.168.180.240]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.Node
INFO: [192.168.180.240]:5701 [dev] Creating MulticastJoiner
Dec 12, 2013 12:02:15 PM com.hazelcast.core.LifecycleService
INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTING
Dec 12, 2013 12:02:21 PM com.hazelcast.cluster.MulticastJoiner
INFO: [192.168.180.240]:5701 [dev]
Members [1] {
Member [192.168.180.240]:5701 this
}
Dec 12, 2013 12:02:22 PM com.hazelcast.core.LifecycleService
INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTED
Dec 12, 2013 12:02:22 PM com.hazelcast.management.ManagementCenterService
INFO: [192.168.180.240]:5701 [dev] Hazelcast will connect to Management Center on address: http://localhost:8080/mancenter-3.1.2/
Join: JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}}
Dec 12, 2013 12:02:22 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Initializing cluster partition table first arrangement...
hazelcast[default] > Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor
INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.8:38764
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.8:38764
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor
INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.7:54436
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.7:54436
Dec 12, 2013 12:03:32 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181
Dec 12, 2013 12:03:32 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev]
Members [3] {
Member [192.168.180.240]:5701 this
Member [192.168.0.8]:5701
Member [192.168.0.7]:5701
}
Dec 12, 2013 12:03:43 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181
Dec 12, 2013 12:03:45 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] All migration tasks has been completed, queues are empty.
Dec 12, 2013 12:03:46 PM com.hazelcast.nio.TcpIpConnection
INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.8]:5701] lost. Reason: Socket explicitly closed
Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.8]:5701
Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev]
Members [2] {
Member [192.168.180.240]:5701 this
Member [192.168.0.7]:5701
}
Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data...
Dec 12, 2013 12:03:48 PM com.hazelcast.nio.TcpIpConnection
INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.7]:5701] lost. Reason: Socket explicitly closed
Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.7]:5701
Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev]
Members [1] {
Member [192.168.180.240]:5701 this
}
Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data...
The only difference I see in TestApp's configuration is
我在 TestApp 的配置中看到的唯一区别是
config.getManagementCenterConfig().setEnabled(true);
config.getManagementCenterConfig().setUrl("http://localhost:8080/mancenter-"+version);
for(int k=1;k<= LOAD_EXECUTORS_COUNT;k++){
config.addExecutorConfig(new ExecutorConfig("e"+k).setPoolSize(k));
}
so I added it in a desperate attempt into myProgram, too. But it does not solve the problem - still each instance only detects itself as member during the whole run.
所以我也拼命地将它添加到 myProgram 中。但这并没有解决问题——在整个运行过程中,每个实例仍然只将自己检测为成员。
Update about how long myProgram runs
关于 myProgram 运行多长时间的更新
Could it be that the program is not running long enough (as pveentjer put it)?
可能是程序运行时间不够长(正如 pveentjer 所说)?
My experiments seem to confirm this:
If the time tbetween Hazelcast.newHazelcastInstance(cfg);
and initializing cleanUp()
(i.e. no longer communicating via hazelcast and no longer checking the number of members) is
我的实验似乎证实了这一点:如果tHazelcast.newHazelcastInstance(cfg);
和初始化之间的时间cleanUp()
(即不再通过 hazelcast 进行通信并且不再检查成员数量)是
- less than 30 seconds, no communication and
members: 1
- more than 30 seconds: all members are found and communication happens (which weirdly seems to be happening for much longer than t- 30 seconds).
- 少于 30 秒,没有通信和
members: 1
- 超过 30 秒:找到所有成员并进行通信(奇怪的是,这似乎发生的时间长于t- 30 秒)。
Is 30 seconds a realistic time span that a hazelcast cluster needs, or is there something strange going on? Here is a log from 4 myPrograms running concurrently (looking for hazelcast-members overlaps 30 seconds for instance 1 and instance 3):
30 秒是榛子集群需要的现实时间跨度,还是发生了一些奇怪的事情?这是来自同时运行的 4 个 myPrograms 的日志(查找 hazelcast-members 重叠 30 秒,例如实例 1 和实例 3):
instance 1: 2013-12-19T12:39:16.553+0100 LOG 0 (START) engine started
looking for members between 2013-12-19T12:39:21.973+0100 and 2013-12-19T12:40:27.863+0100
2013-12-19T12:40:28.205+0100 LOG 35 (Torx-Explorer) Model SymToSim is about to\ exit
instance 2: 2013-12-19T12:39:16.592+0100 LOG 0 (START) engine started
looking for members between 2013-12-19T12:39:22.192+0100 and 2013-12-19T12:39:28.429+0100
2013-12-19T12:39:28.711+0100 LOG 52 (Torx-Explorer) Model SymToSim is about to\ exit
instance 3: 2013-12-19T12:39:16.593+0100 LOG 0 (START) engine started
looking for members between 2013-12-19T12:39:22.145+0100 and 2013-12-19T12:39:52.425+0100
2013-12-19T12:39:52.639+0100 LOG 54 (Torx-Explorer) Model SymToSim is about to\ exit
INSTANCE 4: 2013-12-19T12:39:16.885+0100 LOG 0 (START) engine started
looking for members between 2013-12-19T12:39:21.478+0100 and 2013-12-19T12:39:35.980+0100
2013-12-19T12:39:36.024+0100 LOG 34 (Torx-Explorer) Model SymToSim is about to\ exit
How do I best start my actual distributed algorithm only after enough members are present in the hazelcast cluster? Can I set hazelcast.initial.min.cluster.size
programmatically? https://groups.google.com/forum/#!topic/hazelcast/sa-lmpEDa6Asounds like this would block Hazelcast.newHazelcastInstance(cfg);
until the initial.min.cluster.size is reached. Correct? How synchronously (within which time span) will the different instances unblock?
只有在 hazelcast 集群中存在足够多的成员后,我如何才能最好地启动我的实际分布式算法?我可以以hazelcast.initial.min.cluster.size
编程方式设置吗?https://groups.google.com/forum/#!topic/hazelcast/sa-lmpEDa6A听起来这会阻塞,Hazelcast.newHazelcastInstance(cfg);
直到达到 initial.min.cluster.size。正确的?不同实例将如何同步(在哪个时间跨度内)解除阻塞?
采纳答案by pveentjer
The problem appearently is that the cluster starts (and stops) and doesn't wait till enough members are in the cluster. You can set the hazelcast.initial.min.cluster.size property, to prevent this from happening.
问题似乎是集群启动(和停止)并且不会等到集群中有足够的成员。您可以设置 hazelcast.initial.min.cluster.size 属性,以防止这种情况发生。
You Can set 'hazelcast.initial.min.cluster.size' programmatically using:
您可以使用以下方法以编程方式设置“hazelcast.initial.min.cluster.size”:
Config config = new Config();
config.setProperty("hazelcast.initial.min.cluster.size","3");
回答by asimarslan
Your configuration is correct BUT you have set a very long multicast timeout of 200 sec where the default is 2 sec. setting a smaller value will solve it.
您的配置是正确的,但是您设置了 200 秒的非常长的多播超时,其中默认值为 2 秒。设置一个较小的值将解决它。
From Hazelcast Java API Doc: MulticastConfig.html#setMulticastTimeoutSeconds(int)
来自 Hazelcast Java API 文档: MulticastConfig.html#setMulticastTimeoutSeconds(int)
Specifies the time in seconds that a node should wait for a valid multicast response from another node running in the network before declaring itself as master node and creating its own cluster. This applies only to the startup of nodes where no master has been assigned yet. If you specify a high value, e.g. 60 seconds, it means until a master is selected, each node is going to wait 60 seconds before continuing, so be careful with providing a high value. If the value is set too low, it might be that nodes are giving up too early and will create their own cluster.
指定节点在声明自己为主节点并创建自己的集群之前应等待来自网络中运行的另一个节点的有效多播响应的时间(以秒为单位)。这仅适用于尚未分配主节点的节点的启动。如果您指定一个高值,例如 60 秒,则意味着在选择主节点之前,每个节点将等待 60 秒才能继续,因此请注意提供高值。如果该值设置得太低,则可能是节点过早放弃并创建自己的集群。
回答by pveentjer
Can you try with tcp/ip cluster first to make sure that everything else is fine? Once you have confirmed that there is no problem, try multicast. It could also be a firewall issue btw.
您可以先尝试使用 tcp/ip cluster 以确保其他一切正常吗?一旦确认没有问题,请尝试多播。顺便说一句,这也可能是防火墙问题。
回答by pveentjer
It seems you are using TCP/IP clustering, so that is good. Try the following (from the hazelcast book)
看来您正在使用 TCP/IP 集群,所以这很好。尝试以下(来自榛子书)
If you are making use of iptables, the following rule can be added to allow for outbound traffic from ports 33000-31000:
如果您使用 iptables,可以添加以下规则以允许来自端口 33000-31000 的出站流量:
iptables -A OUTPUT -p TCP --dport 33000:31000 -m state --state NEW -j ACCEPT
and to control incoming traffic from any address to port 5701:
并控制从任何地址到端口 5701 的传入流量:
iptables -A INPUT -p tcp -d 0/0 -s 0/0 --dport 5701 -j ACCEPT
and to allow incoming multicast traffic:
并允许传入的多播流量:
iptables -A INPUT -m pkttype --pkt-type multicast -j ACCEPT
Connectivity test If you are having troubles because machines won't join a cluster, you might check the network connectity between the 2 machines. You can use a tool called iperf for that. On one machine you execute: iperf -s -p 5701 This means that you are listening at port 5701.
连通性测试 如果您因为机器无法加入集群而遇到问题,您可以检查两台机器之间的网络连通性。为此,您可以使用名为 iperf 的工具。在一台机器上执行: iperf -s -p 5701 这意味着您正在侦听端口 5701。
At the other machine you execute the following command:
在另一台机器上执行以下命令:
iperf -c 192.168.1.107 -d -p 5701
Where you replace '192.168.1.107' by the ip address of your first machine. If you run the command and you get output like this:
将“192.168.1.107”替换为第一台机器的 IP 地址。如果您运行该命令并获得如下输出:
------------------------------------------------------------
Server listening on TCP port 5701
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.1.107, TCP port 5701
TCP window size: 59.4 KByte (default)
------------------------------------------------------------
[ 5] local 192.168.1.105 port 40524 connected with 192.168.1.107 port 5701
[ 4] local 192.168.1.105 port 5701 connected with 192.168.1.107 port 33641
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.2 sec 55.8 MBytes 45.7 Mbits/sec
[ 5] 0.0-10.3 sec 6.25 MBytes 5.07 Mbits/sec
You know the 2 machines can connect to each other. However if you are seeing something like this:
你知道两台机器可以相互连接。但是,如果您看到这样的内容:
Server listening on TCP port 5701
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
connect failed: No route to host
Then you know that you might have a network connection problem on your hands.
然后您就知道您手头上可能存在网络连接问题。
回答by pveentjer
So it appears that Multicast is working on your network; which is good.
因此,多播似乎正在您的网络上运行;这很好。
Could you try it with the following settings:
您可以尝试以下设置:
Config cfg = new Config();
NetworkConfig network = cfg.getNetworkConfig();
JoinConfig join = network.getJoin();
join.getTcpIpConfig().setEnabled(false);
join.getAwsConfig().setEnabled(false);
join.getMulticastConfig().setEnabled(true);
HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg);
As you can see, I removed all the customization.
如您所见,我删除了所有自定义。
回答by pveentjer
Can you try to create your Hazelcast instance like this:
你可以尝试像这样创建你的 Hazelcast 实例:
Config cfg = new Config();
HazelcastInstance hz = Hazelcast.newHazelcastInstance(cfg);
The managementcenter stuff and the creation of the executors are not relevant (I added that code in the testapp, so I'm 100% sure about that).
管理中心的内容和执行程序的创建无关(我在 testapp 中添加了该代码,所以我 100% 确定)。
Then you should have exactly the same network configuration as the TestApp.
那么您应该具有与 TestApp 完全相同的网络配置。
回答by Ted Goddard
It looks like Hazelcast uses multicast address 224.2.2.3 on UDP port 54327 (by default) for discovery, and then port 5701 for TCP communication. Opening UDP port 54327 in the firewall fixed discovery for me. (I had also opened TCP port 5701 but that was not sufficient.)
看起来 Hazelcast 在 UDP 端口 54327(默认情况下)上使用多播地址 224.2.2.3 进行发现,然后使用端口 5701 进行 TCP 通信。在防火墙中打开 UDP 端口 54327 为我修复了发现。(我还打开了 TCP 端口 5701,但这还不够。)