Cloud Devops Automation

Posts

Showing posts from May, 2024

[Solved] Error 503 Service Unavailable on the Rolling Deployment of Service in EKS Cluster

May 11, 2024

Error:- Kubernetes including EKS uses the Rolling deployment by default for deploying the applications with Zero Downtime. Generally this works perfectly fine but during a recent major change we tested our application deployment in staging environment before deploying it into production to determine if any downtime will be there and than arrange the deployment accordingly. what we did was a curl request which gets executed every 1 second interval and what we figured out was quite unusal that while application written in Kotlin java gave 503 Service unavailable during the deployment which was not expected since we expected zero downtime because we were using Rolling deployment. Cause:- To deploy an application which will really update with zero downtime the application should meet some requirements. To mention few of them: 1. application should handle graceful shutdown 2. application should implement readiness and liveness probes correctly Solution :- In our case on further ...

[Solved] prometheus-kube-prometheus-prometheus-rulefiles group=kubernetes-resources msg="Failed to get Querier" err="TSDB not ready"

May 11, 2024

Error:- We recently faced an Error with the Prometheus which is deployed using the prometheus operator on the EFS Volume shared across multiple pods. caller=group.go:104 level=error component="rule manager" file=/etc/prometheus/rules/prometheus-kube-prometheus-prometheus-rulefiles-0/monitoring-kube-prometheus-kubernetes-resources.yaml group=kubernetes-resources msg="Failed to get Querier" err="TSDB not ready" caller=head.go:176 level=error component=tsdb msg="Loading on-disk chunks failed" err="iterate on on-disk chunks: out of sequence m-mapped chunk for series ref 30172821 Cause:- As the information relayed in the log was sufficient enough which shows that the data with the chunks got corrupted and when the prometheus restarts it replays the WAL and when it reaches to the particular chunk sequence it gets failed. This is the primary cause of the failure. Solution :- Now for the above corruption of chunk data there is no simple way of r...

[Solved] ERROR: Rancher must be ran with the --privileged flag when running outside of Kubernetes

May 04, 2024

Error:- While running rancher docker container on the ubuntu server, i saw the container crashing very frequently. After checking the logs saw the following error happening very frequently docker run -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher:latest ERROR: Rancher must be ran with the --privileged flag when running outside of Kubernetes Cause:- When you are installing the rancher in test environment where you dont need the identity verification using ssl than it becomes essential you pass the --privileged flag. Solution :- Run the following command to overcome the issue docker run -d --restart=unless-stopped -p 80:80 -p 443:443 --privileged rancher/rancher:latest