bash Kubernetes:如何调试 CrashLoopBackOff
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44673957/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Kubernetes: how to debug CrashLoopBackOff
提问by ixaxaar
I have the following setup:
我有以下设置:
A docker image omg/telperion
on docker hub
A kubernetes cluster (with 4 nodes, each with ~50GB RAM) and plenty resources
omg/telperion
docker hub 上的 docker image一个 kubernetes 集群(有 4 个节点,每个节点有~50GB RAM)和大量资源
I followed tutorials to pull images from dockerhub to kubernetes
我按照教程将图像从 dockerhub 拉到 kubernetes
SERVICE_NAME=telperion
DOCKER_SERVER="https://index.docker.io/v1/"
DOCKER_USERNAME=username
DOCKER_PASSWORD=password
DOCKER_EMAIL="[email protected]"
# Create secret
kubectl create secret docker-registry dockerhub --docker-server=$DOCKER_SERVER --docker-username=$DOCKER_USERNAME --docker-password=$DOCKER_PASSWORD --docker-email=$DOCKER_EMAIL
# Create service yaml
echo "apiVersion: v1 \n\
kind: Pod \n\
metadata: \n\
name: ${SERVICE_NAME} \n\
spec: \n\
containers: \n\
- name: ${SERVICE_NAME} \n\
image: omg/${SERVICE_NAME} \n\
imagePullPolicy: Always \n\
command: [ \"echo\",\"done deploying $SERVICE_NAME\" ] \n\
imagePullSecrets: \n\
- name: dockerhub" > $SERVICE_NAME.yaml
# Deploy to kubernetes
kubectl create -f $SERVICE_NAME.yaml
Which results in the pod going into a CrashLoopBackoff
这导致 pod 进入 CrashLoopBackoff
docker run -it -p8080:9546 omg/telperion
works fine.
docker run -it -p8080:9546 omg/telperion
工作正常。
So my question is Is this debug-able?, if so, how do i debug this?
所以我的问题是 这是否可以调试?如果是这样,我该如何调试?
Some logs:
一些日志:
kubectl get nodes
NAME STATUS AGE VERSION
k8s-agent-adb12ed9-0 Ready 22h v1.6.6
k8s-agent-adb12ed9-1 Ready 22h v1.6.6
k8s-agent-adb12ed9-2 Ready 22h v1.6.6
k8s-master-adb12ed9-0 Ready,SchedulingDisabled 22h v1.6.6
.
.
kubectl get pods
NAME READY STATUS RESTARTS AGE
telperion 0/1 CrashLoopBackOff 10 28m
.
.
kubectl describe pod telperion
Name: telperion
Namespace: default
Node: k8s-agent-adb12ed9-2/10.240.0.4
Start Time: Wed, 21 Jun 2017 10:18:23 +0000
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.244.1.4
Controllers: <none>
Containers:
telperion:
Container ID: docker://c2dd021b3d619d1d4e2afafd7a71070e1e43132563fdc370e75008c0b876d567
Image: omg/telperion
Image ID: docker-pullable://omg/telperion@sha256:c7e3beb0457b33cd2043c62ea7b11ae44a5629a5279a88c086ff4853828a6d96
Port:
Command:
echo
done deploying telperion
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 21 Jun 2017 10:19:25 +0000
Finished: Wed, 21 Jun 2017 10:19:25 +0000
Ready: False
Restart Count: 3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-n7ll0 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-n7ll0:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-n7ll0
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1m 1m 1 default-scheduler Normal Scheduled Successfully assigned telperion to k8s-agent-adb12ed9-2
1m 1m 1 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Normal Created Created container with id d9aa21fd16b682698235e49adf80366f90d02628e7ed5d40a6e046aaaf7bf774
1m 1m 1 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Normal Started Started container with id d9aa21fd16b682698235e49adf80366f90d02628e7ed5d40a6e046aaaf7bf774
1m 1m 1 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Normal Started Started container with id c6c8f61016b06d0488e16bbac0c9285fed744b933112fd5d116e3e41c86db919
1m 1m 1 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Normal Created Created container with id c6c8f61016b06d0488e16bbac0c9285fed744b933112fd5d116e3e41c86db919
1m 1m 2 kubelet, k8s-agent-adb12ed9-2 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "telperion" with CrashLoopBackOff: "Back-off 10s restarting failed container=telperion pod=telperion_default(f4e36a12-566a-11e7-99a6-000d3aa32f49)"
1m 1m 1 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Normal Started Started container with id 3b911f1273518b380bfcbc71c9b7b770826c0ce884ac876fdb208e7c952a4631
1m 1m 1 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Normal Created Created container with id 3b911f1273518b380bfcbc71c9b7b770826c0ce884ac876fdb208e7c952a4631
1m 1m 2 kubelet, k8s-agent-adb12ed9-2 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "telperion" with CrashLoopBackOff: "Back-off 20s restarting failed container=telperion pod=telperion_default(f4e36a12-566a-11e7-99a6-000d3aa32f49)"
1m 50s 4 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Normal Pulling pulling image "omg/telperion"
47s 47s 1 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Normal Started Started container with id c2dd021b3d619d1d4e2afafd7a71070e1e43132563fdc370e75008c0b876d567
1m 47s 4 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Normal Pulled Successfully pulled image "omg/telperion"
47s 47s 1 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Normal Created Created container with id c2dd021b3d619d1d4e2afafd7a71070e1e43132563fdc370e75008c0b876d567
1m 9s 8 kubelet, k8s-agent-adb12ed9-2 spec.containers{telperion} Warning BackOff Back-off restarting failed container
46s 9s 4 kubelet, k8s-agent-adb12ed9-2 Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "telperion" with CrashLoopBackOff: "Back-off 40s restarting failed container=telperion pod=telperion_default(f4e36a12-566a-11e7-99a6-000d3aa32f49)"
Edit 1: Errors reported by kubelet on master:
编辑 1:kubelet 在 master 上报告的错误:
journalctl -u kubelet
.
.
Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: E0621 10:28:49.798140 1809 fsHandler.go:121] failed to collect filesystem stats - rootDiskErr: du command failed on /var/lib/docker/overlay/5cfff16d670f2df6520360595d7858fb5d16607b6999a88e5dcbc09e1e7ab9ce with output
Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: , stderr: du: cannot access '/var/lib/docker/overlay/5cfff16d670f2df6520360595d7858fb5d16607b6999a88e5dcbc09e1e7ab9ce/merged/proc/13122/task/13122/fd/4': No such file or directory
Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: du: cannot access '/var/lib/docker/overlay/5cfff16d670f2df6520360595d7858fb5d16607b6999a88e5dcbc09e1e7ab9ce/merged/proc/13122/task/13122/fdinfo/4': No such file or directory
Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: du: cannot access '/var/lib/docker/overlay/5cfff16d670f2df6520360595d7858fb5d16607b6999a88e5dcbc09e1e7ab9ce/merged/proc/13122/fd/3': No such file or directory
Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: du: cannot access '/var/lib/docker/overlay/5cfff16d670f2df6520360595d7858fb5d16607b6999a88e5dcbc09e1e7ab9ce/merged/proc/13122/fdinfo/3': No such file or directory
Jun 21 10:28:49 k8s-master-ADB12ED9-0 docker[1622]: - exit status 1, rootInodeErr: <nil>, extraDiskErr: <nil>
Edit 2: more logs
编辑 2:更多日志
kubectl logs $SERVICE_NAME -p
done deploying telperion
回答by Fabien
You can access the logs of your pods with
您可以使用以下命令访问 Pod 的日志
kubectl logs [podname] -p
the -p option will read the logs of the previous (crashed) instance
-p 选项将读取前一个(崩溃)实例的日志
If the crash comes from the application, you should have useful logs in there.
如果崩溃来自应用程序,您应该在那里有有用的日志。
回答by mGeek
CrashLoopBackOff
tells that a pod crashes right after the start. Kubernetes tries to start pod again, but again pod crashes and this goes in loop.
CrashLoopBackOff
告诉 Pod 在启动后立即崩溃。Kubernetes 尝试再次启动 pod,但 pod 再次崩溃,这进入了循环。
You can check pods logs for any error by kubectl logs -n --previous
您可以通过 kubectl logs -n --previous 检查 pods 日志是否有任何错误
--previous will show you logs of the previous instantiation of a container
--previous 将向您显示容器先前实例化的日志
Next, you can check "state reason","last state reason" and "Events" Section by describing pod kubectl describe pod -n
接下来,您可以通过描述 pod kubectl describe pod -n 来查看“状态原因”、“最后状态原因”和“事件”部分
"state reason","last state reason"
Sometimes the issue can be because of the Less Memory or CPU provided to application.
有时问题可能是因为提供给应用程序的内存或 CPU 较少。