Recovery Procedure for Loki Availability Zone Failures
Disclaimer: This document describes a recovery procedure by manually recreating the failed pods in another zone. Right now, we are doing this by deleting PersistentVolumeClaim(PVC) of the impacted pods from the failed zone, so they can be recreated in a different zone. This will cause data loss of the data in the PVC. To avoid actual data loss we always set the replication factor in Loki to be 2 or higher so data is always replicated.
In a Kubernetes/OpenShift cluster, a “zone failure” refers to a situation where nodes or resources in a specific availability zone become unavailable. An availability zone is a distinct location within a cloud provider’s data center or region, designed to be isolated from failures in other zones to provide better redundancy and fault tolerance. When a zone failure occurs, it can lead to a loss of services or data if the cluster is not configured properly to handle such scenarios.
This document outlines steps that can be taken to recover stateful Loki pods when there is a zone failure. Stateful Loki pods are deployed as a part of a StatefulSet. The StatefulSet also has PVCs associated with the pods which are dynamically provisioned through the use of a StorageClass. Each stateful Loki pod and its associated PVCs are deployed in the same zone.
Ensure data replication enabled
As discussed in the disclaimer above. The following procedure will delete the PVCs in the failed zone and the data held there. To avoid complete data loss the replication factor in the
CR should always be set to a value greater than 1. This ensures that Loki is replicating the data and even if a zone is lost there should be already be copies of the data in another zone.apiVersion: kind: LokiStack metadata: name: lokistack-dev spec: size: 1x.small storage: secret: name: test type: s3 storageClassName: gp3-csi replication: factor: 2 zones: - topologyKey: maxSkew: 1
When a zone failure occurs in a cluster, the StatefulSet controller will automatically attempt to recover the affected pods in the failed zone. The following steps outline the additional manual intervention required to make sure that the stateful Loki pods are successfully recreated in a new zone.
Detect Zone Failure - The control plane and cloud provider integration should mark nodes in the failed zone.
Reschedule Pods - The StatefulSet controller will automatically attempt to reschedule the pods that were running in the failed zone to nodes in another zone.
Recover Pods and PVCs - Since the StatefulSets have PVCs which are also in the failed zone, automatic reschedule of the stateful Loki pods to a different zone will not work. For more information about storage access for zones, see the Kubernetes documentation. Manual intervention is required at this point to delete the old PVCs in the failed zone to allow succesful recreation of the stateful Loki Pod & PVC in the new zone.
3.1 List pending pods
Multiple stateful Loki pods will be in a
state, after the StatefulSets have unsuccessfully tried to reschedule them to a different zone:kubectl get pods --field-selector status.phase==Pending -n openshift-logging
NAME READY STATUS RESTARTS AGE lokistack-dev-index-gateway-1 0/1 Pending 0 17m lokistack-dev-ingester-1 0/1 Pending 0 16m lokistack-dev-ruler-1 0/1 Pending 0 16m
3.2 List pending PVCs
The above pods are in phase
because their corresponding PVCs are in the old zone.kubectl get pvc -o=json -n openshift-logging | jq '.items[] | select(.status.phase == "Pending") |' -r
storage-lokistack-dev-index-gateway-1 storage-lokistack-dev-ingester-1 wal-lokistack-dev-ingester-1 storage-lokistack-dev-ruler-1 wal-lokistack-dev-ruler-1
3.3 Delete pending PVCs, followed by pending pods
After successful deletion the pods and new PVCs should now be recreated in an available zone because the StatefulSet has a set number of replicas.
kubectl delete pvc storage-lokistack-dev-ingester-1 -n openshift-logging kubectl delete pvc wal-lokistack-dev-ingester-1 -n openshift-logging kubectl delete pod lokistack-dev-ingester-1 -n openshift-logging kubectl delete pvc storage-lokistack-dev-ruler-1 -n openshift-logging kubectl delete pvc wal-lokistack-dev-ruler-1 -n openshift-logging kubectl delete pod lokistack-dev-ruler-1 -n openshift-logging kubectl delete pvc storage-lokistack-dev-index-gateway-1 -n openshift-logging kubectl delete pod lokistack-dev-index-gateway-1 -n openshift-logging
These steps should be followed for all stateful Loki pods that are in the failed zone.
PVCs are stuck in Terminating state
If the PVCs are stuck in a terminating state and are not getting deleted it could be because of the finalizer. The reason why its not terminating is because the PVC metadata finalizers are set to
These steps could remove the finalizer and allow the PVC to be deleted
kubectl patch pvc wal-lokistack-dev-ingester-1 -p '{"metadata":{"finalizers":null}}' -n openshift-logging
kubectl patch pvc storage-lokistack-dev-ingester-1 -p '{"metadata":{"finalizers":null}}' -n openshift-logging
kubectl patch pvc wal-lokistack-dev-ruler-1 -p '{"metadata":{"finalizers":null}}' -n openshift-logging
kubectl patch pvc storage-lokistack-dev-ruler-1 -p '{"metadata":{"finalizers":null}}' -n openshift-logging
kubectl patch pvc storage-lokistack-dev-index-gateway-1 -p '{"metadata":{"finalizers":null}}' -n openshift-logging