java 石英性能

Question

提问by vikas

It seems there is a limit on the number of jobs that Quartz scheduler can run per second. In our scenario we are having about 20 jobs per second firing up for 24x7 and quartz worked well upto 10 jobs per second (with 100 quartz threads and 100 database connection pool size for a JDBC backed JobStore), however, when we increased it to 20 jobs per second, quartz became very very slow and its triggered jobs are very late compared to their actual scheduled time causing many many Misfires and eventually slowing down the overall performance of the system significantly. One interesting fact is that JobExecutionContext.getScheduledFireTime().getTime()for such delayed triggers comes to be 10-20 and even more minutes after their schedule time.

Quartz 调度程序每秒可以运行的作业数量似乎有限制。在我们的场景中，我们每秒有大约 20 个作业在 24x7 上启动，而quartz 可以很好地运行到每秒 10 个作业（对于 JDBC 支持的 JobStore，有 100 个石英线程和 100 个数据库连接池大小），但是，当我们将其增加到 20 时每秒的作业数，quartz 变得非常缓慢，并且其触发的作业与其实际计划时间相比非常晚，导致许多错误，最终显着降低了系统的整体性能。一个有趣的事实是，JobExecutionContext.getScheduledFireTime().getTime()对于这种延迟的触发器，它会比计划时间晚 10-20 分钟，甚至更多分钟。

How many jobs the quartz scheduler can run per second without affecting the scheduled time of the jobs and what should be the optimum number of quartz threads for such load?

在不影响作业的计划时间的情况下，石英调度程序每秒可以运行多少作业，对于此类负载，石英线程的最佳数量应该是多少？

Or am I missing something here?

还是我在这里遗漏了什么？

Details about what we want to achieve:

关于我们想要实现的目标的详细信息：

We have almost 10k items (categorized among 2 or more categories, in current case we have 2 categories) on which we need to some processing at given frequency e.g. 15,30,60... minutes and these items should be processed within that frequency with a given throttle per minute. e.g. lets say for 60 minutes frequency 5k items for each category should be processed with a throttle of 500 items per minute. So, ideally these items should be processed within first 10 (5000/500) minutes of each hour of the day with each minute having 500 items to be processed which are distributed evenly across the each second of the minute so we would have around 8-9 items per second for one category.

我们有近 10k 个项目（分为 2 个或更多类别，在当前情况下我们有 2 个类别）我们需要以给定的频率进行一些处理，例如 15、30、60...分钟，这些项目应该在该频率内处理每分钟给定的油门。例如，假设 60 分钟的频率，每个类别的 5k 个项目应该以每分钟 500 个项目的限制进行处理。因此，理想情况下，这些项目应在一天中每小时的前 10 (5000/500) 分钟内处理，每分钟有 500 个项目要处理，这些项目平均分布在每一分钟的每一秒内，因此我们将有大约 8-一个类别每秒 9 个项目。

Now for to achieve this we have used Quartz as scheduler which triggers jobs for processing these items. However, we don't process each item with in the Job.execute method because it would take 5-50 seconds (averaging to 30 seconds) per item processing which involves webservice call. We rather push a message for each item processing on JMSqueue and separate server machines process those jobs. I have noticed the time being taken by the Job.execute method not to be more than 30 milliseconds.

现在为了实现这一点，我们使用 Quartz 作为调度程序，它触发作业来处理这些项目。但是，我们不会在 Job.execute 方法中处理每个项目，因为每个项目处理需要 5-50 秒（平均为 30 秒），其中涉及 Web 服务调用。我们宁愿为JMS队列上的每个项目处理推送一条消息，并且单独的服务器机器处理这些作业。我注意到 Job.execute 方法所花费的时间不超过30 毫秒。

Server Details:

服务器详情：

Solaris Sparc 64 Bit server with 8/16 cores/threads cpu for scheduler with 16GB RAM and we have two such machines in the scheduler cluster.

Solaris Sparc 64 位服务器，具有 8/16 核/线程 CPU，用于具有 16GB RAM 的调度程序，我们在调度程序集群中有两台这样的机器。

Answer 1

采纳答案by maasg

In a previous project, I was confronted with the same problem. In our case, Quartz performed good up a granularity of a second. Sub-second scheduling was a stretch and as you are observing, misfires happened often and the system became unreliable.

在之前的一个项目中，我遇到了同样的问题。在我们的例子中，Quartz 在一秒的粒度上表现良好。亚秒级调度是一个延伸，正如您所观察到的那样，经常发生失火并且系统变得不可靠。

Solved this issue by creating 2 levels of scheduling: Quartz would schedule a job 'set' of n consecutive jobs. With a clustered Quartz, this means that a given server in the system would get this job 'set' to execute. The n tasks in the set are then taken in by a "micro-scheduler": basically a timing facility that used the native JDK API to further time the jobs up to the 10ms granularity.

通过创建 2 个级别的调度解决了这个问题：Quartz 将调度一个包含 n 个连续作业的作业“集合”。使用集群 Quartz，这意味着系统中的给定服务器将“设置”执行此作业。然后集合中的 n 个任务由“微调度器”接收：基本上是一个计时工具，它使用原生 JDK API 进一步将作业计时到 10 毫秒的粒度。

To handle the individual jobs, we used a master-worker design, where the master was taking care of the scheduled delivery (throttling) of the jobs to a multi-threaded pool of workers.

为了处理单个作业，我们使用了 master-worker 设计，其中 master 负责将作业按计划交付（节流）到多线程工作池。

If I had to do this again today, I'd rely on a ScheduledThreadPoolExecutorto manage the 'micro-scheduling'. For your case, it would look something like this:

如果我今天必须再次这样做，我会依靠ScheduledThreadPoolExecutor来管理“微调度”。对于您的情况，它看起来像这样：

ScheduledThreadPoolExecutor scheduledExecutor;
...
    scheduledExecutor = new ScheduledThreadPoolExecutor(THREAD_POOL_SIZE);
...

// Evenly spread the execution of a set of tasks over a period of time
public void schedule(Set<Task> taskSet, long timePeriod, TimeUnit timeUnit) {
    if (taskSet.isEmpty()) return; // or indicate some failure ...
    long period = TimeUnit.MILLISECOND.convert(timePeriod, timeUnit);
    long delay = period/taskSet.size();
    long accumulativeDelay = 0;
    for (Task task:taskSet) {
        scheduledExecutor.schedule(task, accumulativeDelay, TimeUnit.MILLISECOND);
        accumulativeDelay += delay;
    }
}

This gives you a general idea on how use the JDK facility to micro-schedule tasks. (Disclaimer: You need to make this robust for a prod environment, like check failing tasks, manage retries (if supported), etc...).

这让您大致了解如何使用 JDK 工具来微调度任务。（免责声明：您需要使其适用于生产环境，例如检查失败的任务、管理重试（如果支持）等...）。

With some testing + tuning, we found an optimal balance between the Quartz jobs and the amount of jobs in one scheduled set.

通过一些测试和调优，我们发现了 Quartz 作业和一组预定作业数量之间的最佳平衡。

We experienced a 100X throughput improvement in this way. Network bandwidth was our actual limit.

我们通过这种方式体验了 100 倍的吞吐量提升。网络带宽是我们的实际限制。

Answer 2

回答by Tomasz Nurkiewicz

First of all check How do I improve the performance of JDBC-JobStore?in Quartz documentation.

首先检查如何提高JDBC-JobStore的性能？在 Quartz 文档中。

As you can probably guess there is in absolute value and definite metric. It all depends on your setup. However here are few hints:

正如您可能猜到的那样，绝对值和确定指标是存在的。这一切都取决于您的设置。但是这里有一些提示：

20 jobs per second means around 100 database queries per second, including updates and locking. That's quite a lot!
Consider distributing your Quartz setup to cluster. However if database is a bottleneck, it won't help you. Maybe TerracottaJobStorewill come to the rescue?
Having Kcores in the system everything less than Kwill underutilize your system. If your jobs are CPU intensive, Kis fine. If they are calling external web services, blocking or sleeping, consider much bigger values. However more than 100-200 threads will significantly slow down your system due to context switching.
Have you tried profiling? What is your machine doing most of the time? Can you post thread dump? I suspect poor database performance rather than CPU, but it depends on your use case.

每秒 20 个作业意味着每秒大约 100 个数据库查询，包括更新和锁定。这是相当多的！
考虑将您的 Quartz 设置分发到集群。然而，如果数据库是一个瓶颈，它不会帮助你。也许TerracottaJobStore会来救援？
在K系统中拥有核心的一切都K不会充分利用您的系统。如果您的工作是 CPU 密集型的，那K很好。如果他们正在调用外部 Web 服务、阻塞或休眠，请考虑更大的值。但是，由于上下文切换，超过 100-200 个线程会显着降低系统速度。
你试过分析吗？你的机器大部分时间在做什么？你可以发布线程转储吗？我怀疑数据库性能不佳而不是 CPU，但这取决于您的用例。

Answer 3

回答by corsiKa

You should limit your number of threads to somewhere between nand n*3where nis the number of processors available. Spinning up more threads is going to cause a lot of context switching, since most of them will be blocked most of the time.

你应该限制你的线程数之间的某处n，并n*3在那里n可用处理器的数量。启动更多线程将导致大量上下文切换，因为它们中的大多数在大多数情况下都会被阻塞。

As far as jobs per second, it really depends on how long the jobs run and how often they're blocked for operations like network and disk io.

至于每秒的作业，它实际上取决于作业运行的时间以及它们因网络和磁盘 io 等操作而被阻止的频率。

Also, something to consider is that perhaps quartz isn't the tool you need. If you're sending off 1-2 million jobs a day, you might want to look into a custom solution. What are you even doing with 2 million jobs a day?!

另外，需要考虑的是，石英可能不是您需要的工具。如果您每天发送 1 到 200 万个工作，您可能需要研究自定义解决方案。每天有 200 万个工作岗位，你在做什么？！

Another option, which is a really bad way to approach the problem, but sometimes works... what is the server it's running on? Is it an older server? It might be bumping up the ram or other specs on it will give you some extra 'umph'. Not the best solution, for sure, because that delays the problem, not addresses, but if you're in a crunch it might help.

另一种选择，这是解决问题的一种非常糟糕的方法，但有时有效......它运行的服务器是什么？是旧服务器吗？它可能会提高 ram 或它的其他规格会给你一些额外的“嗡嗡声”。肯定不是最好的解决方案，因为这会延迟问题，而不是解决问题，但如果您处于紧要关头，它可能会有所帮助。

Answer 4

回答by Erik S?rgaard

In situations with high amount of jobs per second make sure your sql server uses row lock and not table lock. In mysql this is done by using InnoDB storage engine, and not the default MyISAM storage engine which only supplies table lock.

在每秒有大量作业的情况下，请确保您的 sql server 使用行锁而不是表锁。在 mysql 中，这是通过使用 InnoDB 存储引擎完成的，而不是仅提供表锁的默认 MyISAM 存储引擎。

Answer 5

回答by volkerk

Fundamentally the approach of doing 1 item at a time is doomed and inefficient when you're dealing with such a large number of things to do within such a short time. You need to group things - the suggested approach of using a job set that then micro-schedules each individual job is a first step, but that still means doing a whole lot of almost nothing per job. Better would be to improve your webservice so you can tell it to process N items at a time, and then invoke it with sets of items to process. And even better is to avoid doing this sort of thing via webservices and process them all inside a database, as sets, which is what databases are good for. Any sort of job that processes one item at a time is fundamentally an unscalable design.

从根本上说，当您要在如此短的时间内处理如此多的事情时，一次只做 1 项的方法注定失败且效率低下。您需要对事物进行分组 - 建议的方法是使用一个作业集，然后对每个单独的作业进行微调度是第一步，但这仍然意味着每个作业几乎什么都不做。更好的是改进您的网络服务，以便您可以告诉它一次处理 N 个项目，然后使用要处理的项目集调用它。更好的是避免通过 Web 服务执行此类操作，并将它们全部作为集合在数据库中处理，这正是数据库的优势所在。任何一次处理一个项目的作业从根本上说都是不可扩展的设计。

java 石英性能

提问by vikas

Details about what we want to achieve:

关于我们想要实现的目标的详细信息：

Server Details:

服务器详情：

采纳答案by maasg

回答by Tomasz Nurkiewicz

回答by corsiKa

回答by Erik S?rgaard

回答by volkerk

相关推荐

最近更新

标签

java 石英性能

提问by vikas

Details about what we want to achieve:

关于我们想要实现的目标的详细信息：

Server Details:

服务器详情：

采纳答案by maasg

回答by Tomasz Nurkiewicz

回答by corsiKa

回答by Erik S?rgaard

回答by volkerk

相关推荐

java HttpURLConnection conn.getRequestProperty 返回 null

java 反转句子中每个单词中的字符 - 代码日志

java 如何在java中将“\\r\\n”更改为行分隔符

java Spring创建单例的多个实例？

相关推荐

最近更新

标签