Cloud Devops Automation

Posts

Showing posts from 2022

[Solved] sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '127.0.0.1'

November 29, 2022

Issue:- When launching a container from the application image, application needs to connect to the mysql database running on the host machine. But when you try to connect using the localhost or 127.0.0.1 you get the following error. Error:- sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on '127.0.0.1' ([Errno 111] Connection refused)") Effect:- Application container went down because application was not able to connect to the mysql database. Resolution:- Since the mysql database is running on the host machine so it makes sense that we use the --network=host which will disable the docker networking and use the host based networking and than your docker container will be able to connect to the host database since both are in the same network. docker run -d --network=host project_app1:latest

[Solved] Failed to pull image rpc error: code = Unknown desc = context deadline exceeded

October 22, 2022

Issue:- When creating a tomcat:9 image pod in the minikube got the following error Warning Failed 58s kubelet Failed to pull image "tomcat:9": rpc error: code = Unknown desc = context deadline exceeded Error:- Warning Failed 58s kubelet Failed to pull image "tomcat:9": rpc error: code = Unknown desc = context deadline exceeded

Detail overview about ISTIO Service Mesh

October 17, 2022

[Solved] warning: containerd.io.rpm: error: Failed dependencies:container-selinux >= 2:2.74 is needed by containerd.io

October 13, 2022

Issue:- When installing the containerd rpm on the centos7 get a dependency issue related to container-selinux preventing the containerd from getting installed Error:- [root@kubemaster ~]# rpm -ivh containerd.io-1.6.8-3.1.el7.x86_64.rpm warning: containerd.io-1.6.8-3.1.el7.x86_64.rpm: Header V4 RSA/SHA512 Signature, key ID 621e9f35: NOKEY error: Failed dependencies: container-selinux >= 2:2.74 is needed by containerd.io-1.6.8-3.1.el7.x86_64 Effect:- Was not able to install the containerd on the Centos7.

[Solved] No package containerd available.

October 13, 2022

Issue:- When installing the containerd on the centos7 using the yum package manager it gives the error mentioned below Error:- No package containerd available. Effect:- Was not able to install the containerd on the Centos7. Resolution:- Download the rpm for the containerd from the following link https://download.docker.com/linux/centos/7/x86_64/stable/Packages/ wget https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.6.8-3.1.el7.x86_64.rpm

Generating token in kubernetes using kubeadm command for adding the worker nodes

September 28, 2022

Issue:- Kubeadm provides you a join token command when you first create a kubernetes cluster. But if you dont have that token handy for the future requirement for addition of the worker nodes to increase the cluster capacity ? Solution:- you can run the following command which will allow you to generate the full token command which can be used to add the worker nodes to master in the future. [centos@kubemaster ~]$ kubeadm token create --print-join-command kubeadm join 172.31.98.106:6443 --token ix1ien.29glfz1p04d7ymtd --discovery-token-ca-cert-hash sha256:1f202db500d698032d075433176dd62f5d0074453daa12ccdfffd637a966a771 Once the token has been generated than you can run the command on the worker node to add it in the kubernetes cluster.

[Solved] Persistentvolume claim pending while installing the Elasticsearch using Helm

September 28, 2022

Issue:- When installing the elasticsearch using the helm , the elasticsearch continaer fails as the multimaster nodes go in pending state for the persistentvolumeclaim and continaer remains in the pending state. Error:- Persistent volume claim remains in the pending state Effect:- Was not able to install the elasticsearch as persistent volume claim was not ready for the Elasticsearch.

[Solved] stacktrace":ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];

September 28, 2022

Issue:- When installing the elasticsearch using the helm , the elasticsearch continaer fails with an exception AccessDeniedException[/usr/share/elasticsearch/data/nodes]; Error:- "cluster.name": "elasticsearch", "node.name": "elasticsearch-master-0", "message": "uncaught exception in thread [main]", "stacktrace": ["org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];" Effect:- Was not able to install the elasticsearch and elasticsearch pod keeps crashing again and again as the healthcheck is not passed and the liveness probe fails restarting the pod again and again. Resolution:- Follow the following steps to resolve the issue 1. The issue comes because the elasticsearch user is not having the permission on the /usr/share/elasticsearch/data/nodes directory. 2. But you cannot directly use kubectl ...

Terraform variables input output local variable theory Part2

July 20, 2022

[Solved] too early for operation, device not yet seeded or device model not acknowledged

July 18, 2022

Issue:- WWhen installing the terragrunt using the snap got the following error Error:- error: too early for operation, device not yet seeded or device model not acknowledged Effect:- Was not able to install the terragrunt as installation failed at that very moment. [root@aafe920be71c ~]# snap install terragrunt error: too early for operation, device not yet seeded or device model not acknowledged Resolution:- Follow the following steps to resolve the issue 1. Try checking the status of the snapd service which was inactive in my case [root@aafe920be71c ~]# systemctl status snapd.seeded.service ● snapd.seeded.service - Wait until snapd is fully seeded Loaded: loaded (/usr/lib/systemd/system/snapd.seeded.service; disabled; vendor preset: disabled) Active: inactive (dead) 2. Now start the snapd service as [root@aafe920be71c ~]# systemctl status snapd.seeded.service ● snapd.seeded.service - Wait until snapd is fully seeded Loaded: loaded (/usr/lib/syst...

Terraform theory part1

July 18, 2022

[Resolved] ERROR Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1': (org.apache.kafka.common.utils.KafkaThread) java.lang.OutOfMemoryError: Java heap space

June 29, 2022

Issue:- When trying to delete the topic in the Amazon MSK kafka clusterr got the following error Error:- ERROR Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1': (org.apache.kafka.common.utils.KafkaThread) java.lang.OutOfMemoryError: Java heap space Effect:- Was not able to delete the Topic in the MSK kafka cluster due to the above error message. ERROR Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1': (org.apache.kafka.common.utils.KafkaThread) java.lang.OutOfMemoryError: Java heap space at java.base/java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:61) at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:348) at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112) at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:424) at org.apache.kafka.common.network.KafkaChannel.r...

[Resolved] default.svc.cluster.local: Name or service not known

June 04, 2022

Issue:- After creating a service when I tried to verify if the DNS name for the service is getting resolved or I got the following error. Error:- my-service.default.svc: Name or service not known Effect:- I was unable to confirm if the service DNS was actually resolving or not and if there was some issue as the service itself was not accessible via curl or the browser [centos@kubemaster service]$ nslookup my-service.default.svc -bash: nslookup: command not found [centos@kubemaster service]$ dig nslookup my-service.default.svc -bash: dig: command not found [centos@kubemaster service]$ ping nslookup my-service.default.svc ping: my-service.default.svc: Name or service not known [centos@kubemaster service]$ ping my-service.default.svc ping: my-service.default.svc: Name or service not known Resolution:- Follow the following steps 1. Create a pod with the DNS utils installed on it for making the nslookup command work inside the pod kubect...

[Resolved] groupVersion shouldn't be empty

June 04, 2022

Issue:- When creating the simple resource like pod, replicaset, deployments etc got a groupVersion error specified below. Error:- groupVersion shouldn't be empty Effect:- Not able to create the resource because of the above error apiversion: v1 kind: Pod metadata: name: pod2 spec: containers: - name: c1 image: nginx Resolution:- If you look at the above configuration precisely you will find the apiversion has been specified incorrectly. It should have been apiVersion k.So just a difference of block letter can make that error. The same error will occur even if you forgot to mention the apiVersion in the configuration or it is misspelled. Below configuration will work fine. apiVersion: v1 kind: Pod metadata: name: pod2 spec: containers: - name: c1 image: nginx Explanation:- apiVersion is hardcoded in the kubernetes. So if you misspell it, not use it or make a e...

Understanding Docker TOCTOU Vulnerability

May 30, 2022

[Resolved] Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.

May 29, 2022

Issue:- Issue is with the dashboard service. When deploying the Dashboard service using the yaml in the kubernetes it gives the following error. Error:- Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds. Effect:- Because the dashboard service is not able to connect to the dashboard-metrics-scraper service the UI for the dashboard service is not loading up due to which the Dashboard is not working in the UI and timeout after some time.

[Resolved] Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

May 29, 2022

Issue:- When installing the metricserver in the kubernetes getting the following error. Error:- Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io) Effect:- Due to the above error the metricserver will not work [centos@kubemaster dashboard]$ kubectl top nodes W0529 10:18:25.234815 13218 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

Analysing various threats in application tampering and security and SLSA mitigation of such threats

May 28, 2022

[Resolved] An error occurred (Throttling) when calling the DescribeLoadBalancers operation (reached max retries: 4): Rate exceeded

May 18, 2022

Issue:- If you are having big infrastructure and you have put lot of automation in place than your awscli limits might reach the thresholds which might result in error like. Error:- An error occurred (Throttling) when calling the DescribeLoadBalancers operation (reached max retries: 4): Rate exceeded Effect:- The command or the script which you have run might failed due to the limit being reached for the calls to the aws resource and retries also exhausted. You might run the command again if you doing it manually and facing error but in case its some script that will cause bigger issue to make the script failed without the retry logic written within the script as well which you have created.

Weaknesses/Limitations of gRPC Part5

May 17, 2022

Strengths of gRPC

May 17, 2022

Understanding gRPC ARchitecture Part3

May 16, 2022

Basic gRPC concepts Part2

May 16, 2022

Introduction to gRPC for fast and scalable api development - Part1

May 16, 2022

Understanding kubernetes components kubectl, daemon, apiserver, apiversi...

May 12, 2022

Terraform modules practices for scalable architecture implementation & i...

May 11, 2022

Terraform modules practices for scalable architecture implementation

May 11, 2022

[Resolved] from setuptools_rust import RustExtension ModuleNotFoundError: No module named 'setuptools_rust'

May 09, 2022

Issue:- Issue with the cryptography package and Rust during the Ansible Installation on the Centos. Error:- Downloading https://files.pythonhosted.org/packages/3d/5f/addb8b91fd356792d28e59a8275fec833323cb28604fb3a497c35d7cf0a3/cryptography-37.0.1.tar.gz (585kB) 100% |████████████████████████████████| 593kB 2.0MB/s Complete output from command python setup.py egg_info: =============================DEBUG ASSISTANCE========================== If you are seeing an error here please try the following to successfully install cryptography: Upgrade to the latest pip and try again. This will fix errors for most users. See: https://pip.pypa.io/en/stable/installing/#upgrading-pip =============================DEBUG ASSISTANCE========================== Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-build-nfv80r3s/cryptography/...

[Resolved] Error response from daemon: invalid MountType: "=bind"

May 09, 2022

Issue:- Unable to deploy the visualizer service in the Docker Swarm Error:- Error response from daemon: invalid MountType: "=bind" Effect:- # docker service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager --mount=type==bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer Resolution:- # docker service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer Explanation:- You need to use the =bind and not ==bind to solve the problem

All About IPAM and use in cloud, devops, vpc troubleshooting

May 05, 2022

Preventing DDOS Attacks Using AWS WAF Rule based WEBACL PART 3

May 05, 2022

Preventing DDOS Attacks Using AWS WAF Rule based WEBACL PART 2

May 04, 2022

Preventing DDOS Attacks Using AWS WAF Rule based WEBACL

May 04, 2022

Understanding Kubernetes Canary Deployment With Architecture Diagram Part-II

April 07, 2022

Understanding the Concept of Canary Deployment in Kubernetes Part-1

March 25, 2022

Troubleshooting and Logging in Distroless Images

March 17, 2022

Signing the Docker Images using Cosign - Part 4

March 16, 2022

Comparing distroless vs distrobased vs alpine Docker Image on basis of vulnerability scan

March 15, 2022

Handson Node Application build on Distroless Image Docker - Part 2

March 15, 2022

Handson Distroless Installation from Scratch - Part 1

March 15, 2022

Understanding Distroless Container Images

March 12, 2022

[Solved] Intermittent / burst logs in the Newrelic / ELK

March 09, 2022

Issue:- Although the application was writing the logs continuously and shipper shipping but the logs were missing for a particular period and burst of logs with spikes being observed in the Newrelic ELK. Error:- Following graph shows the actual issue of intermittent or burst of the logs in ELK Effect:- Due to the non availability of the logs it was becoming difficult to troubleshoot the issue as the logs were getting delayed and sometimes might be missed out as well. Resolution:- Printing the error logs or logs required for troubleshooting helps to overcome this issue. Explanation:- More than 1million event logs are getting posted in an hour due to which the Disk would be becoming a bottleneck and burst of events are being pushed into the Newrelic ELK. Lowering down and printing the error logs or logs required for troubleshooting should help to overcome this issue of intermittent logs in the Newrelic/ELK.