Kubernetes – MongoDB – ReplicaSet – Recovering from a failed op logs replay

Last Sunday, all of our MongoDB replicas are down and seems to be in an endless loop of shutting down, restarting and shutting down again. Here are the infrastructure details:

Google Cloud Platform
Kubernetes 1.18.x (GKE)
Jenkins-X 2.x
MongoDB Helm2 chart from Bitnami
Basic replication with originally 3 replicas and 1 arbiter
Reduced replicas to 2 and 1 arbiter a week or two ago

Observations

Only 1 MongoDB replica is being run/restarted
MongoDB arbiter is running fine
Log says it is running a lot of these commands:

REPL [worker-x] Applied op

The log seems fine but when I tailed the logs, it just dies. Turns out the liveness probe is causing the pod to get restarted.

Initial Actions Taken

First, I tried to revive the replicas by just killing the pods to force a restart. I even tried deleting the statefulset and re-deployed it but to no avail.

Next thing I tried was to copy the files from the persisted volume into a new statefulset but I get some incompatibility error of some kind that seems too much for me to fix.

I decided to dive deeper into the replication issue instead as it looks like the replicas are trying to recover itself but at some point, the process gets terminated.

Increasing Liveness Probe Delay

I read somewhere that MongoDB needs some time to replay the op logs before it becomes available for clients. Turns out, the default liveness probe for the chart was just 30 seconds. I switched it to 5 mins, 10 mins, 20 mins, 30 mins and eventually 1 hour before I get it working.

cool-service-mongodb:
  architecture: replicaset
  persistence:
    enabled: true
  replicaCount: 3
  livenessProbe:
    initialDelaySeconds: 3600
  readinessProbe:
    initialDelaySeconds: 10

I’m not sure why it would take that long to complete the op log replay but it is what it is. Here are some additional actions that I’ve made.

Scale down then scale up

To make sure that at least 1 MongoDB replica can complete its op log replay, I scaled down the replicaset to just 1.

kubectl scale statefulsets jx-cool-service-mongodb --replicas=1 -n jx-staging

Then I deleted the pod and watched the log until it becomes ready. Once ready, I scaled in into 2 replicas.

kubectl scale statefulsets jx-cool-service-mongodb --replicas=2 -n jx-staging

3 replicas seems ideal for replicaset

Based on what I read, 3 replicas seems to be the ideal replica count for MongoDB replicaset. It was probably a fatal mistake to reduce it to 2 to reduce cost and here we are now, wasting time and delaying launch due to this issue.

So, I scaled in up to 3 replicas which was the original design.

kubectl scale statefulsets jx-coll-service-mongodb --replicas=3 -n jx-staging

Once all the MongoDB replicas are running, make sure to revert the liveness probe delay to a reasonable value if you don’t like the default 30 seconds. I switched mine to 120 seconds.

That’s it! I hope I’m not getting fired!

Observations

Initial Actions Taken

Increasing Liveness Probe Delay

Scale down then scale up

3 replicas seems ideal for replicaset

Leave a reply