遇到的问题

上期博客中, 已经讲到了suspend参数带来的一个问题:

如果你将suspendtrue变更为false时, 也就是重新打开定时任务, 错过的任务会立刻执行(如果没有设置 starting deadline), K8S会马上调度先前错过的任务.

另外还有可能遇到这样报错:

too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew 太多的任务丢失了(超过100), 1. 设置或是减少.spec.startingDeadlineSeconds, 2. 检查时钟偏移

startingdeadlineseconds含义 – 官方文档

官方文档中是这样给出的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
The .spec.startingDeadlineSeconds field is optional. It stands for the deadline in seconds for starting the job if it misses its scheduled time for any reason. After the deadline, the cron job does not start the job. Jobs that do not meet their deadline in this way count as failed jobs. If this field is not specified, the jobs have no deadline.

The CronJob controller counts how many missed schedules happen for a cron job. If there are more than 100 missed schedules, the cron job is no longer scheduled. When .spec.startingDeadlineSeconds is not set, the CronJob controller counts missed schedules from status.lastScheduleTime until now.

该参数是可选的, 它用来表示那些错过执行时间的任务的deadline. 在deadline之后的任务将不会被执行,
未能按时完成的任务也会认为是失败的任务. 没有设置该参数默认没有deadline.

控制器会统计定时任务有多少错过, 如果错过的任务超过100个, 该定时任务将不会执行(即使suspend为false,
它也不会继续执行). 当`startingDeadlineSeconds`没有设置时, 统计区间为`上次执行时间` <=> `现在`.


For example, one cron job is supposed to run every minute, the status.lastScheduleTime of the cronjob is 5:00am, but now it’s 7:00am. That means 120 schedules were missed, so the cron job is no longer scheduled.

If the .spec.startingDeadlineSeconds field is set (not null), the CronJob controller counts how many missed jobs occurred from the value of .spec.startingDeadlineSeconds until now.

For example, if it is set to 200, it counts how many missed schedules occurred in the last 200 seconds. In that case, if there were more than 100 missed schedules in the last 200 seconds, the cron job is no longer scheduled.


例如, 一个定时任务需要每分钟执行, 最近一次执行时间为5:00am, 但是现在已经是7:00am, 那么有120个任务
错过了, 该定时任务将再也不会被调度执行.

但是如果设置了`startingDeadlineSeconds`, 控制器会统计的区间变为
`现在`-startingDeadlineSeconds <=> `现在`.

当设置该值为200s时, 控制器会统计`7:00am - 200s` <=> `7:00am`这段时间错过的任务.
当然如果错过的任务数超过了100, 那么该定时也不会被执行.

官方文档-CronJob的限制中更加详细:

增量的部分是:

1
2
3
4
5
A CronJob is counted as missed if it has failed to be created at its scheduled time. For example, If concurrencyPolicy is set to Forbid and a CronJob was attempted to be scheduled when there was a previous schedule still running, then it would count as missed.


定时任务在调度时间没有执行, 就会认为是错过. 例如, 当`concurrencyPolicy`设置为了`Forbid`,
当前一个任务仍然在运行, 导致该任务无法被创建, 那么该任务就会认为是错过了.

如果你看了上面的内容已经明白了suspend以及startingDeadlineSeconds的用法, 那自然是极好的, 如果仍然觉得云里雾里, 那之后请随我一起去看K8S中此部分的源码.

K8S中的cronjob代码阅读:

我会以Kubernetes v1.17的代码为例, 以下代码截取自, 我将error处理的部分已全部删掉:

https://github.com/kubernetes/kubernetes/blob/v1.17.0/pkg/controller/cronjob/cronjob_controller.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
// Run starts the main goroutine responsible for watching and syncing jobs.
func (jm *Controller) Run(stopCh <-chan struct{}) {
// ...
// Run函数每10s同步一次定时任务
go wait.Until(jm.syncAll, 10*time.Second, stopCh)
// ...
}

// syncAll lists all the CronJobs and Jobs and reconciles them.
func (jm *Controller) syncAll() {
// ...
jobsBySj := groupJobsByParent(js)
err = pager.New(pager.SimplePageFunc(cronJobListFunc)).EachListItem(context.Background(), metav1.ListOptions{}, func(object runtime.Object) error {
sj, ok := object.(*batchv1beta1.CronJob)
// 针对每个任务进行同步
syncOne(sj, jobsBySj[sj.UID], time.Now(), jm.jobControl, jm.sjControl, jm.recorder)
cleanupFinishedJobs(sj, jobsBySj[sj.UID], jm.jobControl, jm.sjControl, jm.recorder)
return nil
})
}

func syncOne(sj *batchv1beta1.CronJob, js []batchv1.Job, now time.Time, jc jobControlInterface, sjc sjControlInterface, recorder record.EventRecorder) {
// ....

// 获取直到当前没有被调度任务的时间点, 请参考下方的函数
times, err := getRecentUnmetScheduleTimes(*sj, now)

// ...
// 最近一次的需要被调度的时间, 可以看到K8S只会调度最近的一次任务
scheduledTime := times[len(times)-1]
tooLate := false
if sj.Spec.StartingDeadlineSeconds != nil {
tooLate = scheduledTime.Add(time.Second * time.Duration(*sj.Spec.StartingDeadlineSeconds)).Before(now)
}
if tooLate {
// 如果最近一次的调度时间也超过了deadline允许的时间, 那么说明列表中所有任务都过了deadline, 不再继续执行
// 并且没有设置lastscheduletime.
klog.V(4).Infof("Missed starting window for %s", nameForLog)
recorder.Eventf(sj, v1.EventTypeWarning, "MissSchedule", "Missed scheduled time to start a job: %s", scheduledTime.Format(time.RFC1123Z))
// TODO: Since we don't set LastScheduleTime when not scheduling, we are going to keep noticing
// the miss every cycle. In order to avoid sending multiple events, and to avoid processing
// the sj again and again, we could set a Status.LastMissedTime when we notice a miss.
// Then, when we call getRecentUnmetScheduleTimes, we can take max(creationTimestamp,
// Status.LastScheduleTime, Status.LastMissedTime), and then so we won't generate
// and event the next time we process it, and also so the user looking at the status
// can see easily that there was a missed execution.
return
}
// 禁止并发运行任务, 则退出
if sj.Spec.ConcurrencyPolicy == batchv1beta1.ForbidConcurrent && len(sj.Status.Active) > 0 {
// Regardless which source of information we use for the set of active jobs,
// there is some risk that we won't see an active job when there is one.
// (because we haven't seen the status update to the SJ or the created pod).
// So it is theoretically possible to have concurrency with Forbid.
// As long the as the invocations are "far enough apart in time", this usually won't happen.
//
// TODO: for Forbid, we could use the same name for every execution, as a lock.
// With replace, we could use a name that is deterministic per execution time.
// But that would mean that you could not inspect prior successes or failures of Forbid jobs.
klog.V(4).Infof("Not starting job for %s because of prior execution still running and concurrency policy is Forbid", nameForLog)
return
}
// 允许任务替换, 直接删除上一个活跃的任务
if sj.Spec.ConcurrencyPolicy == batchv1beta1.ReplaceConcurrent {
for _, j := range sj.Status.Active {
klog.V(4).Infof("Deleting job %s of %s that was still running at next scheduled start time", j.Name, nameForLog)

job, err := jc.GetJob(j.Namespace, j.Name)
if err != nil {
recorder.Eventf(sj, v1.EventTypeWarning, "FailedGet", "Get job: %v", err)
return
}
if !deleteJob(sj, job, jc, recorder) {
return
}
}
}

// 运行该任务
jobReq, err := getJobFromTemplate(sj, scheduledTime)
jobResp, err := jc.CreateJob(sj.Namespace, jobReq)

// Add the just-started job to the status list.
ref, err := getRef(jobResp)
sj.Status.Active = append(sj.Status.Active, *ref)

// 更新调度时间
sj.Status.LastScheduleTime = &metav1.Time{Time: scheduledTime}

return
}

// getRecentUnmetScheduleTimes gets a slice of times (from oldest to latest) that have passed when a Job should have started but did not.
//
// If there are too many (>100) unstarted times, just give up and return an empty slice.
// If there were missed times prior to the last known start time, then those are not returned.
func getRecentUnmetScheduleTimes(sj batchv1beta1.CronJob, now time.Time) ([]time.Time, error) {
// ...
var earliestTime time.Time
if sj.Status.LastScheduleTime != nil {
earliestTime = sj.Status.LastScheduleTime.Time
} else {
// If none found, then this is either a recently created scheduledJob,
// or the active/completed info was somehow lost (contract for status
// in kubernetes says it may need to be recreated), or that we have
// started a job, but have not noticed it yet (distributed systems can
// have arbitrary delays). In any case, use the creation time of the
// CronJob as last known start time.
earliestTime = sj.ObjectMeta.CreationTimestamp.Time
}
if sj.Spec.StartingDeadlineSeconds != nil {
// Controller is not going to schedule anything below this point
schedulingDeadline := now.Add(-time.Second * time.Duration(*sj.Spec.StartingDeadlineSeconds))

if schedulingDeadline.After(earliestTime) {
earliestTime = schedulingDeadline
}
}
// ....

// earliestTime = MAX(创建时间, 上次调度时间, 当前时间-deadline seconds)
// 下面的循环在检查 从earliestTime => now 这段过程中应该被调度的时间点.
// 并返回这部分时间点, 如果任务点的数量超过了100, 马上报错
for t := sched.Next(earliestTime); !t.After(now); t = sched.Next(t) {
starts = append(starts, t)
// An object might miss several starts. For example, if
// controller gets wedged on friday at 5:01pm when everyone has
// gone home, and someone comes in on tuesday AM and discovers
// the problem and restarts the controller, then all the hourly
// jobs, more than 80 of them for one hourly scheduledJob, should
// all start running with no further intervention (if the scheduledJob
// allows concurrency and late starts).
//
// However, if there is a bug somewhere, or incorrect clock
// on controller's server or apiservers (for setting creationTimestamp)
// then there could be so many missed start times (it could be off
// by decades or more), that it would eat up all the CPU and memory
// of this controller. In that case, we want to not try to list
// all the missed start times.
//
// I've somewhat arbitrarily picked 100, as more than 80,
// but less than "lots".
if len(starts) > 100 {
// We can't get the most recent times so just return an empty slice
return []time.Time{}, fmt.Errorf("too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew")
}
}
return starts, nil
}

总结

阅读了Kubernetes的Cron任务实现后, 基本对其实现有了大体的把握. 重要的一点是K8S每10s检查一次定时任务的时间点, 检查 当前时间 - DeadlineSeconds <=> 当前时间中未被调度的任务时间点, 选择最后一项任务进行执行, 并更新该任务的调度时间.

FAQ

下面的讨论均是建立在定时任务的最小粒度为分钟!!!

在Linux的cron任务中, * * * * * 表示每分钟执行一次.

任务是否会完全不执行

有可能, 有下面两种情况

  1. 不设置DeadlineSeconds, 或是DeadlineSeconds过大, 导致missed任务超过100个, K8S将不会执行此定时任务
  2. 当你的DeadlineSeconds设置为10s以内, 每次的now - DeadlineSeconds可能都不会包含任务的执行点

任务是否会多执行或是少执行

有可能, 不过最多只会多执行一次. K8S会在下一次该任务调度时执行. 下面是这种场景:

以最小粒度每分钟为例, 1分钟 >> 10s, 假如其中有一次任务执行花费了2分22秒

1
2
3
4
5
6
7
8
start       end
+ +
| |
v v
+----+----++-+-+
+ + + ^ +
+
check

我们会在2分30秒的时候触发检查, 如果DeadlineSeconds>30s, 那么2分钟这个点会被包含在K8S所调度的任务中, 本来已经被忽略的任务执行了一次.

从上面这个例子也能看出, K8S的定时任务不会少执行任务的.

DeadlineSeconds该设置多大?

目前来看, Deadline设置小于6000s, 大于10s这个区间就可以. 我倾向于将这个时间取小, 10分钟(600)以内.

应该应对的异常

  1. 上述例子中, 任务的实际执行时长大于任务间隔. 本来应该每分钟执行的定时任务, 但由于某次任务执行时间较长, 导致区间的部分任务丢失.
  2. len(times) > 1说明任务有积存, 需要我们确认系统状况.