java 石英在集群配置中的奇怪行为

Question

提问by Claudio Query

I'm developing scheduled services.

我正在开发预定服务。

The application is developed using JDK 1.6, Spring Framework 2.5.6 and Quartz 1.8.4 to schedule jobs.

该应用程序是使用 JDK 1.6、Spring Framework 2.5.6 和 Quartz 1.8.4 开发的，用于调度作业。

I've two clustered servers with WebLogic Server 10.3.5.

我有两个带有 WebLogic Server 10.3.5 的集群服务器。

Sometimes it seems that the scheduling of quartz goes crazy. Analyzing the conditions in which it occurs, there seems to be a clock "desynchronization" greater than a second between the clustered servers. However this desynchronization is not always due to the system time of the servers, sometimes it seems that even if the clocks of the machines are synchronized, there is a little "delay" introduced by the JVM.

有时似乎石英的调度变得疯狂。分析它发生的条件，集群服务器之间似乎有一个大于一秒的时钟“不同步”。然而，这种不同步并不总是由于服务器的系统时间，有时看起来即使机器的时钟同步，JVM也会引入一点“延迟”。

Has anyone encountered the same problem? Is there a way to solve it?

有没有人遇到过同样的问题？有办法解决吗？

Thanks in advance

提前致谢

Answer 1

回答by ercasta

When using a JDBC-JobStore on Oracle with version 2.2.1, I experienced the same problem.

在 2.2.1 版的 Oracle 上使用 JDBC-JobStore 时，我遇到了同样的问题。

In my case, I was running Quartz on a single node. However, I noticed the database machine was not time synchronized with the node running Quartz.

就我而言，我在单个节点上运行 Quartz。但是，我注意到数据库机器与运行 Quartz 的节点没有时间同步。

I activated ntpd on both the database machine and the machine running Quartz, and the problem went away after a few minutes.

我在数据库机器和运行 Quartz 的机器上都激活了 ntpd，几分钟后问题就消失了。

Answer 2

回答by aloplop85

I am using Quartz 2.2.1 and I notice a strange behavior whenever a cluster recovery occurs.

我正在使用 Quartz 2.2.1，每当发生集群恢复时，我都会注意到一个奇怪的行为。

For instance, even if the machines have been synchronized with ntpdate service I obtain this message on cluster instance recovery:

例如，即使机器已与 ntpdate 服务同步，我也会在集群实例恢复时收到此消息：

org.quartz.impl.jdbcjobstore.JobStoreSupport findFailedInstances “This scheduler instance () is still active but was recovered by another instance in the cluster. This may cause inconsistent behavior”.

org.quartz.impl.jdbcjobstore.JobStoreSupport findFailedInstances “这个调度器实例 () 仍然是活动的，但是被集群中的另一个实例恢复了。这可能会导致不一致的行为”。

Heresays that the solution is: "Synchronize the time on all cluster nodes and then restart the cluster. The messages should no longer appear in the log."

这里说的解决方案是：“同步所有集群节点上的时间，然后重新启动集群。日志中不应再出现消息。”

As every machine is synchronized maybe this "delay" is introduced by the JVM?? I don′t know...:(

由于每台机器都是同步的，也许这种“延迟”是由 JVM 引入的？？我不知道...：（

Answer 3

回答by sagneta

This issue is nearly always attributable to clock-skew. Even if you think you have NTPd setup properly a couple of things can still happen:

这个问题几乎总是归因于时钟偏差。即使你认为你已经正确设置了 NTPd，仍然会发生一些事情：

We thoughtwe had NTPd working (and it was configured properly) but on AWS the firewalls were blocking the NTP ports. UDP 123. Again, that's UDP not TCP.
If you don't sync often enough you will accumulate clock-skew. The accuracy of the timers on many motherboards is notoriously wonky. Thus over time (days) suddenly you get these Quartz errors. Over 5 minutes and you get many security errors like Kerberos for example.

我们以为NTPd 可以正常工作（并且配置正确），但在 AWS 上，防火墙阻止了 NTP 端口。UDP 123。同样，这是 UDP 而不是 TCP。
如果您不经常同步，则会累积时钟偏差。许多主板上的计时器的准确性是出了名的不稳定。因此，随着时间的推移（几天），您会突然收到这些 Quartz 错误。超过 5 分钟，您会收到许多安全错误，例如 Kerberos。

So the moral of this story is sync with NTPd but do it often and verify it is actually working.

所以这个故事的寓意是与 NTPd 同步，但经常这样做并验证它是否真的有效。

Answer 4

回答by Cloud

The issue is most often happens because of de-synchronisation of time in cluster nodes. However it also may be caused by unstable connection of application to DB. Such connection problems may be caused by network problems (if application server and DB server are on different machines) or performance problems (DB server processes requests very slowly by some reason).

该问题最常发生是因为集群节点中的时间不同步。然而，这也可能是由于应用程序与数据库的连接不稳定造成的。这种连接问题可能是由于网络问题（如果应用服务器和数据库服务器在不同的机器上）或性能问题（数据库服务器由于某种原因处理请求很慢）引起的。

In such case chances of appearance of this issue may be reduced by increasing org.quartz.jobStore.clusterCheckinInterval value.

在这种情况下，可以通过增加 org.quartz.jobStore.clusterCheckinInterval 值来减少出现此问题的机会。

Answer 5

回答by Andrei Kovrov

I faced the same issue. Firstly you should check the logs and time sync for your cluster.

我遇到了同样的问题。首先，您应该检查集群的日志和时间同步。

The marker is messages in logs:

标记是日志中的消息：

08-02-2018 17:13:49.926 [QuartzScheduler_schedulerService-pc6061518092456074_ClusterManager] INFO  o.s.s.quartz.LocalDataSourceJobStore - ClusterManager: detected 1 failed or restarted instances.

08-02-2018 17:14:06.137 [QuartzScheduler_schedulerService-pc6061518092765988_ClusterManager] WARN  o.s.s.quartz.LocalDataSourceJobStore - This scheduler instance (pc6061518092765988) is still active but was recovered by another instance in the cluster.

When the first node observed that the second node is absent more than org.quartz.jobStore.clusterCheckinIntervalit unregistered the second node from the cluster and removed all its triggers.

当第一个节点观察到第二个节点不存在时，org.quartz.jobStore.clusterCheckinInterval它会从集群中取消注册第二个节点并删除其所有触发器。

Take a look to the synchronization algorithm: org.quartz.impl.jdbcjobstore.JobStoreSupport.ClusterManager#run

看一下同步算法： org.quartz.impl.jdbcjobstore.JobStoreSupport.ClusterManager#run

It may happen when 'check in' takes long time.

当“签入”需要很长时间时可能会发生这种情况。

My solution is to override org.quartz.impl.jdbcjobstore.JobStoreSupport#calcFailedIfAfter. The hardcoded value '7500L' looks like as the grace period. I replaced it as parameter.

我的解决方案是覆盖org.quartz.impl.jdbcjobstore.JobStoreSupport#calcFailedIfAfter. 硬编码值“7500L”看起来像宽限期。我将其替换为参数。

Note: If you using SchedulerFactoryBean be careful with registering new JobStoreSupport subclass. The Spring forcibly register own store org.springframework.scheduling.quartz.LocalDataSourceJobStore.

注意：如果您使用 SchedulerFactoryBean 注册新的 JobStoreSupport 子类时要小心。Spring强行注册了自己的store org.springframework.scheduling.quartz.LocalDataSourceJobStore。

java 石英在集群配置中的奇怪行为

提问by Claudio Query

回答by ercasta

回答by aloplop85

回答by sagneta

回答by Cloud

回答by Andrei Kovrov

相关推荐

最近更新

标签

java 石英在集群配置中的奇怪行为

提问by Claudio Query

回答by ercasta

回答by aloplop85

回答by sagneta

回答by Cloud

回答by Andrei Kovrov

相关推荐

java 如何在java编程中将最近的值存储在变量中？

Java 输出分数作为十进制程序

java Spring Batch - 集群环境 - 故障转移机制

从数据库列中检索时，Java 不会将“\n”视为新行

相关推荐

最近更新

标签