Automated ETCD Backup
- - 4 min read
Securing ETCD is one of the major Day-2 tasks for a Kubernetes cluster. This article will explain how to create a backup using OpenShift Cronjob.
There is absolutely no warranty. Verify your backups regularly and perform restore tests. |
Prerequisites
The following is required:
OpenShift Cluster 4.x
Integrated Storage, might be NFS or anything. Best practice would be a RWX enabled storage.
Configure Project & Cronjob
Create the following objects in OpenShift. This fill create:
A Project called
ocp-etcd-backup
A PersistentVolumeClaim to store the backups. Change to your appropriate StorageClass and accessMode
A ServiceAccount called
openshift-backup
A dedicated ClusterRole which is able to start (debug pods)
A ClusterRoleBinding between the created ServiceAccount and the customer ClusterRole
A 2nd ClusterRoleBinding, which gives our ServiceAccount the permission to start privileged containers. This is required to start a debug pod on a control plane node.
A CronJob which performs the backup … see Callouts for inline explanations.
A helm chart, which would create these objects below, can be found at: https://github.com/tjungbauer/ocp-auto-backup. This is probably a better way to manage the variables via the values.yaml file. |
kind: Namespace
apiVersion: v1
metadata:
name: ocp-etcd-backup
annotations:
openshift.io/description: Openshift Backup Automation Tool
openshift.io/display-name: Backup ETCD Automation
openshift.io/node-selector: ''
spec: {}
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: etcd-backup-pvc
namespace: ocp-etcd-backup
spec:
accessModes:
- ReadWriteOnce (1)
resources:
requests:
storage: 100Gi
storageClassName: gp2
volumeMode: Filesystem
---
kind: ServiceAccount
apiVersion: v1
metadata:
name: openshift-backup
namespace: ocp-etcd-backup
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-etcd-backup
rules:
- apiGroups: [""]
resources:
- "nodes"
verbs: ["get", "list"]
- apiGroups: [""]
resources:
- "pods"
- "pods/log"
verbs: ["get", "list", "create", "delete", "watch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: openshift-backup
subjects:
- kind: ServiceAccount
name: openshift-backup
namespace: ocp-etcd-backup
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-etcd-backup
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: etcd-backup-scc-privileged
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:openshift:scc:privileged
subjects:
- kind: ServiceAccount
name: openshift-backup
namespace: ocp-etcd-backup
---
kind: CronJob
apiVersion: batch/v1
metadata:
name: cronjob-etcd-backup
namespace: ocp-etcd-backup
labels:
purpose: etcd-backup
spec:
schedule: '*/5 * * * *' (2)
startingDeadlineSeconds: 200
concurrencyPolicy: Forbid
suspend: false
jobTemplate:
metadata:
creationTimestamp: null
spec:
backoffLimit: 0
template:
metadata:
creationTimestamp: null
spec:
nodeSelector:
node-role.kubernetes.io/master: '' (3)
restartPolicy: Never
activeDeadlineSeconds: 200
serviceAccountName: openshift-backup
schedulerName: default-scheduler
hostNetwork: true
terminationGracePeriodSeconds: 30
securityContext: {}
containers:
- resources:
requests:
cpu: 300m
memory: 250Mi
terminationMessagePath: /dev/termination-log
name: etcd-backup
command: (4)
- /bin/bash
- '-c'
- >-
oc get no -l node-role.kubernetes.io/master --no-headers -o
name | grep `hostname` | head -n 1 | xargs -I {} -- oc debug
{} -- bash -c 'chroot /host sudo -E
/usr/local/bin/cluster-backup.sh /home/core/backup' ; echo
'Moving Local Master Backups to target directory (from
/home/core/backup to mounted PVC)'; mv /home/core/backup/*
/etcd-backup/; echo 'Deleting files older than 30 days' ; find
/etcd-backup/ -type f -mtime +30 -exec rm {} \;
securityContext:
privileged: true
runAsUser: 0
imagePullPolicy: IfNotPresent
volumeMounts:
- name: temp-backup
mountPath: /home/core/backup (5)
- name: etcd-backup
mountPath: /etcd-backup (6)
terminationMessagePolicy: FallbackToLogsOnError
image: registry.redhat.io/openshift4/ose-cli
serviceAccount: openshift-backup
volumes:
- name: temp-backup
hostPath:
path: /home/core/backup
type: ''
- name: etcd-backup
persistentVolumeClaim:
claimName: etcd-backup-pvc
dnsPolicy: ClusterFirst
tolerations:
- operator: Exists
effect: NoSchedule
- operator: Exists
effect: NoExecute
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 5
1 | RWO is used here, since I have no other available storage on my test cluster. |
2 | How often shall the job be executed. Here, every 5 minutes. |
3 | Bind the job to "Master" nodes. |
4 | Command to be executed… It fetches the actual local master nodename and starts a debugging Pod there. The backup script is called and moves the backup to /home/core/backup which is a folder on the control plane itself. The move command will move the backups from the local folder to the actual backup target volume. Finally, it will remove backups older than 30 days. |
5 | Mounted /home/core/backup on the master nodes, here the command will store the backups before they are moved |
6 | Target destination for the etcd backup on the mounted PVC |
Start a Job
If you do not want to wait until the CronJob is triggered, you can manually start the Job using the following commands:
oc create job backup --from=cronjob/cronjob-etcd-backup -n ocp-etcd-backup
This will start a Pod which will do the backup:
Starting pod/ip-10-0-196-187us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
found latest kube-apiserver: /etc/kubernetes/static-pod-resources/kube-apiserver-pod-15
found latest kube-controller-manager: /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-10
found latest kube-scheduler: /etc/kubernetes/static-pod-resources/kube-scheduler-pod-9
found latest etcd: /etc/kubernetes/static-pod-resources/etcd-pod-3
etcdctl is already installed
{"level":"info","ts":1638199790.980932,"caller":"snapshot/v3_snapshot.go:119","msg":"created temporary db file","path":"/home/core/backup/snapshot_2021-11-29_152949.db.part"}
{"level":"info","ts":"2021-11-29T15:29:50.991Z","caller":"clientv3/maintenance.go:200","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1638199790.9912837,"caller":"snapshot/v3_snapshot.go:127","msg":"fetching snapshot","endpoint":"https://10.0.196.187:2379"}
{"level":"info","ts":"2021-11-29T15:29:53.306Z","caller":"clientv3/maintenance.go:208","msg":"completed snapshot read; closing"}
Snapshot saved at /home/core/backup/snapshot_2021-11-29_152949.db
{"level":"info","ts":1638199793.3482974,"caller":"snapshot/v3_snapshot.go:142","msg":"fetched snapshot","endpoint":"https://10.0.196.187:2379","size":"180 MB","took":2.367303503}
{"level":"info","ts":1638199793.348459,"caller":"snapshot/v3_snapshot.go:152","msg":"saved","path":"/home/core/backup/snapshot_2021-11-29_152949.db"}
{"hash":1180914745,"revision":10182252,"totalKey":19360,"totalSize":179896320}
snapshot db and kube resources are successfully saved to /home/core/backup
Removing debug pod ...
Moving Local Master Backups to target directory (from /home/core/backup to mounted PVC)
Verifying the Backup
Let’s start a dummy Pod which can access the PVC to verify if the backup is really there.
apiVersion: v1
kind: Pod
metadata:
name: verify-etcd-backup
spec:
containers:
- name: verify-etcd-backup
image: registry.access.redhat.com/ubi8/ubi
command: ["sleep", "3000"]
volumeMounts:
- name: etcd-backup
mountPath: /etcd-backup
volumes:
- name: etcd-backup
persistentVolumeClaim:
claimName: etcd-backup-pvc
Logging into that Pod will show the available backups stored at /etcd-backup which is the mounted PVC.
oc rsh -n ocp-etcd-backup verify-etcd-backup ls -la etcd-backup
total 1406196
drwxr-xr-x. 3 root root 4096 Nov 29 17:00 .
dr-xr-xr-x. 1 root root 25 Nov 29 17:06 ..
drwx------. 2 root root 16384 Nov 29 15:21 lost+found
-rw-------. 1 root root 179896352 Nov 29 15:21 snapshot_2021-11-29_152150.db
-rw-------. 1 root root 179896352 Nov 29 15:29 snapshot_2021-11-29_152949.db
-rw-------. 1 root root 179896352 Nov 29 15:32 snapshot_2021-11-29_153159.db
-rw-------. 1 root root 179896352 Nov 29 15:36 snapshot_2021-11-29_153618.db
-rw-------. 1 root root 179896352 Nov 29 15:55 snapshot_2021-11-29_155513.db
-rw-------. 1 root root 179896352 Nov 29 16:00 snapshot_2021-11-29_160020.db
-rw-------. 1 root root 179896352 Nov 29 16:55 snapshot_2021-11-29_165521.db
-rw-------. 1 root root 179896352 Nov 29 17:00 snapshot_2021-11-29_170020.db
-rw-------. 1 root root 89875 Nov 29 15:21 static_kuberesources_2021-11-29_152150.tar.gz
-rw-------. 1 root root 89875 Nov 29 15:29 static_kuberesources_2021-11-29_152949.tar.gz
-rw-------. 1 root root 89875 Nov 29 15:32 static_kuberesources_2021-11-29_153159.tar.gz
-rw-------. 1 root root 89875 Nov 29 15:36 static_kuberesources_2021-11-29_153618.tar.gz
-rw-------. 1 root root 89875 Nov 29 15:55 static_kuberesources_2021-11-29_155513.tar.gz
-rw-------. 1 root root 89875 Nov 29 16:00 static_kuberesources_2021-11-29_160020.tar.gz
-rw-------. 1 root root 89875 Nov 29 16:55 static_kuberesources_2021-11-29_165521.tar.gz
-rw-------. 1 root root 89875 Nov 29 17:00 static_kuberesources_2021-11-29_170020.tar.gz
Copyright © 2020 - 2024 Toni Schmidbauer & Thomas Jungbauer