java DatabaseLessLeasing 失败并且服务器不在多数集群分区中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16600097/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 23:26:06  来源:igfitidea点击:

DatabaseLessLeasing has failed and Server is not in majority cluster partition

javaweblogiccluster-computing

提问by user1771540

I'm facing a DatabaseLessLeasing issue. Our's is a middleware application. We don't have any database and our application is running on WebLogic server. We have 2 servers in one cluster. Both servers are up and running, but we are using only one server to do the processing. When the primary server fails, whole server and services will migrate to secondary server. This is working fine.

我正面临 DatabaseLessLeasing 问题。我们的是一个中间件应用程序。我们没有任何数据库,我们的应用程序在 WebLogic 服务器上运行。我们在一个集群中有 2 台服务器。两台服务器都已启动并正在运行,但我们仅使用一台服务器进行处理。当主服务器出现故障时,整个服务器和服务将迁移到辅助服务器。这工作正常。

But we had one issue end of last year that our secondary server hardware was down and secondary server was not available. We got the below issue. When we went to Oracle, they suggested to have one more server or have one database which is high availability to hold the cluster leasing information to point out which is the master server. As of now we don't have that option to do as putting the new server means there will be a budget issue and client is not ready for it.

但是去年年底我们遇到了一个问题,即我们的辅助服务器硬件出现故障并且辅助服务器不可用。我们得到了以下问题。我们去Oracle的时候,他们建议多加一台服务器或者有一个高可用的数据库来保存集群租用信息,指出哪台是主服务器。到目前为止,我们还没有这个选项,因为放置新服务器意味着会有预算问题,而客户还没有准备好。

Our Weblogic configuration for cluster are:

我们的集群 Weblogic 配置是:

  1. one cluster with 2 managed servers
  2. cluster messaging mode is Multicast
  3. Migration Basis is Consensus
  4. load algorithm is Round Robin
  1. 一个集群有 2 个托管服务器
  2. 集群消息模式是多播
  3. 迁移基础是共识
  4. 加载算法是循环

This is the log I found

这是我找到的日志

LOG: Critical Health BEA-310006 Critical Subsystem DatabaseLessLeasing has failed. Setting server state to FAILED. Reason: Server is not in the majority cluster partition>

Critical WebLogicServer BEA-000385 Server health failed. Reason: health of critical service 'DatabaseLessLeasing' failed Notice WebLogicServer BEA-000365 Server state changed to FAILED

日志:关键运行状况 BEA-310006 关键子系统 DatabaseLessLeasing 失败。将服务器状态设置为 FAILED。原因:服务器不在多数集群分区>

严重 WebLogicServer BEA-000385 服务器运行状况失败。原因:关键服务 'DatabaseLessLeasing' 的运行状况失败通知 WebLogicServer BEA-000365 服务器状态更改为 FAILED

**Note: **I remember one thing, the server was not down when this happened. Both the servers were running but all of a sudden server tried to restart and it unable to restart. Restart was failed. I saw that status was showing as failedToRestart and application went down.

**注意:**我记得一件事,发生这种情况时服务器没有关闭。两台服务器都在运行,但突然服务器试图重新启动,但无法重新启动。重启失败。我看到状态显示为 failedToRestart 并且应用程序关闭。

Can anyone please help me on this issue.

任何人都可以在这个问题上帮助我。

Thank you

谢谢

回答by Prasad

Consensus leasing requires a majority of servers to continue functioning. Any time there is a network partition, the servers in the majority partition will continue to run while those in the minority partition will fail since they cannot contact the cluster leader or elect a new cluster leader since they will not have the majority of servers. If the partition results in an equal division of servers, then the partition that contains the cluster leader will survive while the other one will fail.

共识租赁需要大多数服务器才能继续运行。任何时候有网络分区,多数分区中的服务器将继续运行,而少数分区中的服务器将失败,因为它们无法联系集群领导或选举新的集群领导,因为他们将没有大多数服务器。如果分区导致服务器的平均划分,那么包含集群领导者的分区将继续存在,而另一个将失败。

Owing to above functionality, If automatic server migration is enabled, the servers are required to contact the cluster leader and renew their leases periodically. Servers will shut themselves down if they are unable to renew their leases. The failed servers will then be automatically migrated to the machines in the majority partition.

由于上述功能,如果启用自动服务器迁移,服务器需要联系集群领导并定期更新租约。如果无法续订租约,服务器将自行关闭。出现故障的服务器将自动迁移到多数分区中的机器。

The server which got partitioned (and not part of majority cluster) will get into FAILED state. This behavior is put in place to avoid split-brain scenarios where there are two partitions of a cluster and both think they are the real cluster. When a cluster gets segmented, the largest segment will survive and the smaller segment will shut itself down. When servers cannot reach the cluster master, they determine if they are in the larger partition or not. If they are in the larger partition, they will elect a new cluster master. If not, they will all shut down when their lease expires. Two-node clusters are problematic in this case. When a cluster gets partitioned, which partition is the largest? When the cluster master goes down in a two-node cluster, the remaining server has no way of knowing if it is in the majority or not. In that case, if the remaining server is the cluster master, it will continue to run. If it is not the master, it will shut down.

被分区的服务器(而不是多数集群的一部分)将进入 FAILED 状态。实施此行为是为了避免出现裂脑场景,其中集群有两个分区,并且都认为它们是真正的集群。当集群被分割时,最大的段将继续存在,而较小的段将自行关闭。当服务器无法访问集群主服务器时,它们会确定它们是否在更大的分区中。如果他们在更大的分区中,他们将选举一个新的集群主节点。如果没有,它们将在租约到期时全部关闭。在这种情况下,双节点集群是有问题的。当集群被分区时,哪个分区最大?当集群主服务器在双节点集群中宕机时,剩余的服务器无法知道它是否占多数。在这种情况下,如果剩余的服务器是集群主服务器,它将继续运行。如果它不是master,它将关闭。

Usually this error shows up when there are only 2 managed servers in onc cluster.

通常,当一个集群中只有 2 个托管服务器时,会出现此错误。

To solve this kind of issues, create another server; since the cluster is only of 2 nodes, any server will fall out of the majority cluster partition if it loses connectivity/drops cluster broadcast messages. In this scenario, there are no other servers part of the cluster.

要解决此类问题,请创建另一台服务器;由于集群只有 2 个节点,如果任何服务器失去连接/丢弃集群广播消息,它都会脱离多数集群分区。在这种情况下,集群中没有其他服务器部分。

For Consensus Leasing, it is always recommended to create a cluster with at-least 3 nodes; that way you can ensure some stability.

对于 Consensus Leasing,总是建议创建一个至少有 3 个节点的集群;这样你就可以确保一些稳定性。

In that scenario, even if one server falls out of the cluster, the other two still function correctly as they remain in the majority cluster partition The third one will rejoin the cluster, or will be eventually restarted.

在这种情况下,即使一台服务器退出集群,其他两台服务器仍能正常运行,因为它们仍保留在多数集群分区中。第三台服务器将重新加入集群,或者最终将重新启动。

In a scenario where you have only 2 servers as part of the cluster, one falling out from the cluster will result in both the servers being restarted, as they are not a part of the majority cluster partition; this would ultimately result in a very unstable environment.

在集群中只有 2 台服务器的情况下,从集群中掉出一台服务器将导致两台服务器都重新启动,因为它们不是多数集群分区的一部分;这最终会导致非常不稳定的环境。

Another possible scenario is that there was a communication issue between the Managed servers, you should look out for messages like "lost .* message(s)" [in case of unicast it is some thing like "Lost 2 unicast message(s)."] This may be caused due to temporary network issues

另一种可能的情况是托管服务器之间存在通信问题,您应该注意诸如“lost .* message(s)”之类的消息[在单播的情况下,它类似于“Lost 2 unicast message(s)”。 "] 这可能是由于临时网络问题引起的

回答by Kofi White

Make sure that the node manger for the secondary node in the clustered migration configuration is up and running.

确保集群迁移配置中辅助节点的节点管理器已启动并正在运行。