Java Quartz 失败时重试

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4408858/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 17:08:41  来源:igfitidea点击:

Quartz retry when failure

javaquartz-scheduler

提问by Averroes

Let's say I have a trigger configured this way:

假设我有一个以这种方式配置的触发器:

<bean id="updateInsBBTrigger"         
    class="org.springframework.scheduling.quartz.CronTriggerBean">
    <property name="jobDetail" ref="updateInsBBJobDetail"/>
    <!--  run every morning at 5 AM  -->
    <property name="cronExpression" value="0 0 5 * * ?"/>
</bean>

The trigger have to connect with another application and if there is any problem (like a connection failure) it should to retry the task up to five times every 10 minutes or until success. There is any way to configure the trigger to work like this?

触发器必须与另一个应用程序连接,如果有任何问题(如连接失败),它应该每 10 分钟重试任务最多五次或直到成功。有什么方法可以将触发器配置为这样工作吗?

采纳答案by dogbane

Source: Automatically Retry Failed Jobs in Quartz

来源在 Quartz 中自动重试失败的作业

If you want to have a job which keeps trying over and over again until it succeeds, all you have to do is throw a JobExecutionException with a flag to tell the scheduler to fire it again when it fails. The following code shows how:

如果你想要一个不断重复尝试直到成功的作业,你所要做的就是抛出一个带有标志的 JobExecutionException 来告诉调度程序在它失败时再次触发它。以下代码显示了如何:

class MyJob implements Job {

    public MyJob() {
    }

    public void execute(JobExecutionContext context) throws JobExecutionException {

        try{
            //connect to other application etc
        }
        catch(Exception e){

            Thread.sleep(600000); //sleep for 10 mins

            JobExecutionException e2 = new JobExecutionException(e);
            //fire it again
            e2.setRefireImmediately(true);
            throw e2;
        }
    }
}

It gets a bit more complicated if you want to retry a certain number of times. You have to use a StatefulJob and hold a retryCounter in its JobDataMap, which you increment if the job fails. If the counter exceeds the maximum number of retries, then you can disable the job if you wish.

如果您想重试一定次数,它会变得有点复杂。你必须使用一个 StatefulJob 并在它的 JobDataMap 中保存一个 retryCounter,如果作业失败,你会增加它。如果计数器超过最大重试次数,则您可以根据需要禁用该作业。

class MyJob implements StatefulJob {

    public MyJob() {
    }

    public void execute(JobExecutionContext context) throws JobExecutionException {
        JobDataMap dataMap = context.getJobDetail().getJobDataMap();
        int count = dataMap.getIntValue("count");

        // allow 5 retries
        if(count >= 5){
            JobExecutionException e = new JobExecutionException("Retries exceeded");
            //make sure it doesn't run again
            e.setUnscheduleAllTriggers(true);
            throw e;
        }


        try{
            //connect to other application etc

            //reset counter back to 0
            dataMap.putAsString("count", 0);
        }
        catch(Exception e){
            count++;
            dataMap.putAsString("count", count);
            JobExecutionException e2 = new JobExecutionException(e);

            Thread.sleep(600000); //sleep for 10 mins

            //fire it again
            e2.setRefireImmediately(true);
            throw e2;
        }
    }
}

回答by dimitrisli

I would suggest for more flexibility and configurability to better store in your DB two offsets: the repeatOffsetwhich will tell you after how long the job should be retried and the trialPeriodOffsetwhich will keep the information of the time window that the job is allowed to be rescheduled. Then you can retrieve these two parameters like (I assume you are using Spring):

我建议提供更大的灵活性和可配置性,以更好地在您的数据库中存储两个偏移量:repeatOffset将告诉您应在多长时间后重试作业,而trialPeriodOffset将保留允许作业的时间窗口信息重新安排。然后您可以检索这两个参数,例如(我假设您使用的是 Spring):

String repeatOffset = yourDBUtilsDao.getConfigParameter(..);
String trialPeriodOffset = yourDBUtilsDao.getConfigParameter(..);

Then instead of the job to remember the counter it will need to remember the initalAttempt:

然后,而不是记住计数器的工作,它需要记住initalAttempt:

Long initialAttempt = null;
initialAttempt = (Long) existingJobDetail.getJobDataMap().get("firstAttempt");

and perform the something like the following check:

并执行如下检查:

long allowedThreshold = initialAttempt + Long.parseLong(trialPeriodOffset);
        if (System.currentTimeMillis() > allowedThreshold) {
            //We've tried enough, time to give up
            log.warn("The job is not going to be rescheduled since it has reached its trial period threshold");
            sched.deleteJob(jobName, jobGroup);
            return YourResultEnumHere.HAS_REACHED_THE_RESCHEDULING_LIMIT;
        }

It would be a good idea to create an enum for the result of the attempt that is being returned back to the core workflow of your application like above.

为尝试返回到应用程序的核心工作流(如上)的结果创建一个枚举是一个好主意。

Then construct the rescheduling time:

然后构造重新调度时间:

Date startTime = null;
startTime = new Date(System.currentTimeMillis() + Long.parseLong(repeatOffset));

String triggerName = "Trigger_" + jobName;
String triggerGroup = "Trigger_" + jobGroup;

Trigger retrievedTrigger = sched.getTrigger(triggerName, triggerGroup);
if (!(retrievedTrigger instanceof SimpleTrigger)) {
            log.error("While rescheduling the Quartz Job retrieved was not of SimpleTrigger type as expected");
            return YourResultEnumHere.ERROR;
}

        ((SimpleTrigger) retrievedTrigger).setStartTime(startTime);
        sched.rescheduleJob(triggerName, triggerGroup, retrievedTrigger);
        return YourResultEnumHere.RESCHEDULED;

回答by Flávio Ferreira

I would recommend an implementation like this one to recover the job after a fail:

我会推荐一个这样的实现来在失败后恢复工作:

final JobDataMap jobDataMap = jobCtx.getJobDetail().getJobDataMap();
// the keys doesn't exist on first retry
final int retries = jobDataMap.containsKey(COUNT_MAP_KEY) ? jobDataMap.getIntValue(COUNT_MAP_KEY) : 0;

// to stop after awhile
if (retries < MAX_RETRIES) {
  log.warn("Retry job " + jobCtx.getJobDetail());

  // increment the number of retries
  jobDataMap.put(COUNT_MAP_KEY, retries + 1);

  final JobDetail job = jobCtx
      .getJobDetail()
      .getJobBuilder()
       // to track the number of retries
      .withIdentity(jobCtx.getJobDetail().getKey().getName() + " - " + retries, "FailingJobsGroup")
      .usingJobData(jobDataMap)
      .build();

  final OperableTrigger trigger = (OperableTrigger) TriggerBuilder
      .newTrigger()
      .forJob(job)
       // trying to reduce back pressure, you can use another algorithm
      .startAt(new Date(jobCtx.getFireTime().getTime() + (retries*100))) 
      .build();

  try {
    // schedule another job to avoid blocking threads
    jobCtx.getScheduler().scheduleJob(job, trigger);
  } catch (SchedulerException e) {
    log.error("Error creating job");
    throw new JobExecutionException(e);
  }
}

Why?

为什么?

  1. It will not block Quartz Workers
  2. It will avoid back pressure. With setRefireImmediately the job will be fired immediately and it could lead to back pressure issues
  1. 它不会阻塞 Quartz Workers
  2. 它将避免背压。使用 setRefireImmediately 作业将立即被解雇,这可能会导致背压问题