-->

Saturday, June 4, 2022

[Resolved] default.svc.cluster.local: Name or service not known

 

Issue:- 

After creating a service when I tried to verify if the DNS name for the service is getting resolved or I got the following error. 

Error:- 

 my-service.default.svc: Name or service not known

Effect:-

I was unable to confirm if the service DNS was actually resolving or not and if there was some issue as the service itself was not accessible via curl or the browser

 [[email protected] service]$ nslookup my-service.default.svc  
 -bash: nslookup: command not found  
 [[email protected] service]$ dig nslookup my-service.default.svc  
 -bash: dig: command not found  
 [[email protected] service]$ ping nslookup my-service.default.svc  
 ping: my-service.default.svc: Name or service not known  
 [[email protected] service]$ ping my-service.default.svc  
 ping: my-service.default.svc: Name or service not known  

Resolution:-

Follow the following steps

1. Create a pod with the DNS utils installed on it for making the nslookup command work inside the pod
 kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml  
 

2. Now run the nslookup command on the DNS name and verify if its getting resolved or not
[[email protected] service]$ kubectl exec -it dnsutils -- nslookup my-service.default.svc  
 Server:          10.96.0.10  
 Address:     10.96.0.10#53  
 Name:     my-service.default.svc.cluster.local  
 Address: 10.111.144.147  

Explanation:-

Previously i was trying to resolve the DNS on the host network but the coredns works inside the kubernetes cluster only or pod network not the host network that is why you cannot use the traditional way of resolving the DNS using the nslookup or dig command. So we installed a pod with the dnsutils installed inside it and than we provide the nslookup command from inside the pod and directly print the result on the stdout. So you can use this way to resolve the DNS and verify if its working fine or not. Also you should put till svc as the kubernetes can take the cluster.local itself.

[Resolved] groupVersion shouldn't be empty

 

Issue:- 

When creating the simple resource like pod, replicaset, deployments etc got a groupVersion error specified below. 

Error:- 

 groupVersion shouldn't be empty

Effect:-

Not able to create the resource because of the above error

 apiversion: v1  
 kind: Pod  
 metadata:  
  name: pod2  
 spec:  
  containers:  
  - name: c1  
   image: nginx  

Resolution:-

If you look at the above configuration precisely you will find the apiversion  has been specified incorrectly. It should have been apiVersion  k.So just a difference of block letter can make that error. The same error will occur even if you forgot to mention the apiVersion in the configuration or it is misspelled. Below configuration will work fine.

 apiVersion: v1  
 kind: Pod  
 metadata:  
  name: pod2  
 spec:  
  containers:  
  - name: c1  
   image: nginx  
 

Explanation:-

apiVersion is hardcoded in the kubernetes. So if you misspell it, not use it or make a error in the Capital and small letter as well it will give the above error.

Sunday, May 29, 2022

[Resolved] Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.

 

Issue:- 

Issue is with the dashboard service. When deploying the Dashboard service using the yaml in the kubernetes it gives the following error.

Error:- 

 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.

Effect:-

Because the dashboard service is not able to connect to the dashboard-metrics-scraper service the UI for the dashboard service is not loading up due to which the Dashboard is not working in the UI and timeout after some time.

[Resolved] Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

Issue:- 

When installing the metricserver in the kubernetes getting the following error. 

Error:- 

 Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

Effect:-

Due to the above error the metricserver will not work

[[email protected] dashboard]$ kubectl top nodes
W0529 10:18:25.234815   13218 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

Wednesday, May 18, 2022

[Resolved] An error occurred (Throttling) when calling the DescribeLoadBalancers operation (reached max retries: 4): Rate exceeded

 

Issue:- 

If you are having big infrastructure and you have put lot of automation in place than your awscli limits might reach the thresholds which might result in error like.

Error:- 

 An error occurred (Throttling) when calling the DescribeLoadBalancers operation (reached max retries: 4): Rate exceeded

Effect:-

The command or the script which you have run might failed due to the limit being reached for the calls to the aws resource and retries also exhausted. You might run the command again if you doing it manually and facing error but in case its some script that will cause bigger issue to make the script failed without the retry logic written within the script as well which you have created.

Monday, May 9, 2022

[Resolved] from setuptools_rust import RustExtension ModuleNotFoundError: No module named 'setuptools_rust'

 

Issue:- 

Issue with the cryptography package and Rust during the Ansible Installation on the Centos.

Error:- 

  Downloading https://files.pythonhosted.org/packages/3d/5f/addb8b91fd356792d28e59a8275fec833323cb28604fb3a497c35d7cf0a3/cryptography-37.0.1.tar.gz (585kB) 100% |████████████████████████████████| 593kB 2.0MB/s Complete output from command python setup.py egg_info: =============================DEBUG ASSISTANCE========================== If you are seeing an error here please try the following to successfully install cryptography: Upgrade to the latest pip and try again. This will fix errors for most users. See: https://pip.pypa.io/en/stable/installing/#upgrading-pip =============================DEBUG ASSISTANCE========================== Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-build-nfv80r3s/cryptography/setup.py", line 14, in <module> from setuptools_rust import RustExtension ModuleNotFoundError: No module named 'setuptools_rust'

Effect:-

Ansible Installation failed while using pip with the above error.

Resolution:-

 #pip install --upgrade pip 


Explanation:-

Basically the issue is coming due to the outdated version of the pip being used for the Ansible installation due to which the above error occurs.

Try upgrading the pip first and than try to install the Ansible again using the pip and it should succeed.

[Resolved] Error response from daemon: invalid MountType: "=bind"

 

Issue:- 

Unable to deploy the visualizer service in the Docker Swarm

Error:- 

  Error response from daemon: invalid MountType: "=bind"

Effect:-

# docker service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager --mount=type==bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer

Resolution:-

# docker service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager --mount=type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer

Explanation:-

You need to use the =bind and not ==bind to solve the problem

Wednesday, March 9, 2022

[Solved] Intermittent / burst logs in the Newrelic / ELK

 

Issue:- 

Although the application was writing the logs continuously and shipper shipping but the logs were missing for a particular period and burst of logs with spikes being observed in the Newrelic ELK.

Error:- 

Following graph shows the actual issue of intermittent or burst of the logs in ELK


Effect:-

Due to the non availability of the logs it was becoming difficult to troubleshoot the issue as the logs were getting delayed and sometimes might be missed out as well.

Resolution:-

Printing the error logs or logs required for troubleshooting helps to overcome this issue.

Explanation:-

More than 1million event logs are getting posted in an hour due to which the Disk would be becoming a bottleneck and burst of events are being pushed into the Newrelic ELK.

Lowering down and printing the error logs or logs required for troubleshooting should help to overcome this issue of intermittent logs in the Newrelic/ELK.

Sunday, March 6, 2022

[Solved] panic: unable to load configmap based request-header-client-ca-file: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kubernetes-infra:metrics-server" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

 

Issue:- 

Metric server is unable to load the configmap "extension-apiserver-authentication" and giving forbidden due to which metricserver does show load with kubectl top command.

Error:- 

  panic: unable to load configmap based request-header-client-ca-file: configmaps "extension-apiserver-authentication" is forbidden: User "system:serviceaccount:kubernetes-infra:metrics-server" cannot get resource "configmaps" in API group "" in the namespace "kube-system"

Effect:-

Metric server doesn't work and kubectl top command fails 

  Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

Resolution:-

Replace

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  
  WITH
  
  apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  - nodes/stats
  - namespaces
  - configmaps

Explanation:-

In the latest version of the metricserver , the yaml has reduced resources on which metrics server can authorise. So we need to add the nodes/stats , namespaces and configmaps which are used by the metric server for working. Because of the above missing resource when it tries to contact configmaps it getting an error after the resources in the yaml it as it able to successfully connect to the configmaps and run the top command. 

Monday, February 21, 2022

[Resolved] error updating CloudFront Distribution: InvalidArgument: The parameter Header Name contains Cloudfront-Viewer-Address that is not allowed.

 

Issue:- 

Terraform run time error on apply that the parameter Header Cloudfront-Viewer-Address is not allowed

Error:- 

  error updating CloudFront Distribution: InvalidArgument: The parameter Header Name contains Cloudfront-Viewer-Address that is not allowed.

Effect:-

SCloudfront-Viewer-Address contains the IP Address of the viewer that sent the request to Cloudfront, and the port used for the request. For e.g 3.110.159.137:443. Because the header is not whitelisted the same value is not available in the system.

Resolution:-

Change the configuration under the Behaviours in the cloudfront from Legacy cache settings to Legacy cache settings to Cache policy and origin request policy (recommended)

Explanation:-

The header cloudfront-viewer-address is supported by the Legacy cache settings to Cache policy and origin request policy (recommended) only and you cannot use it with the Legacy cache settings as per the AWS Documentation.

Sunday, February 13, 2022

[Solved] NodePort only responding on node where pod is running

 The solution for the above problem has been discussed in the link below

https://www.unixcloudfusion.in/2022/02/solved-caliconode-is-not-ready-bird-is.html

[Solved] calico/node is not ready: BIRD is not ready: BGP not established (Calico 3.6 / k8s 1.14.1)

 

Issue:- 

The Issue was first recognised after we expose a service using the Nodeport and tried to access over the localhost on other node followed by Nodeport number. But the curl request failed with timeout

On Inspecting the containers of the master node one of the containers of the calico node was not ready.

When I used the describe container for that service it gave the error mentioned below.

Error:- 

  Normal   Created    14m                kubelet            Created container calico-node
  Normal   Started    14m                kubelet            Started container calico-node
  Warning  Unhealthy  14m (x3 over 14m)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused

  Warning  Unhealthy  14m                kubelet            Readiness probe failed: 2022-02-13 06:15:58.211 [INFO][210] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.31.127.174,172.31.126.126

  Warning  Unhealthy  14m  kubelet  Readiness probe failed: 2022-02-13 06:16:08.297 [INFO][259] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.31.127.174,172.31.126.126

  Warning  Unhealthy  14m  kubelet  Readiness probe failed: 2022-02-13 06:16:18.333 [INFO][285] confd/health.go 180: Number of node(s) with BGP peering established = 0

Effect:-

So one of the calico container was failing and still it didn't seem to be an issue until we tried to expose the service and tried to access the service. After much troubleshooting and recreating , figured out that this is because one of the calico pod was down. This restricted the access of the nodeport service from the other ports. This issue was observed in the kubernetes version 1.23.3.

You would need to modify the calico.yaml file before applying the yaml to your kubernetes cluster.

Saturday, February 5, 2022

[Error response from daemon: rpc error: code = Unknown desc = constraint expected one operator from ==, !=]

 

Issue:- 

So received the error while deploying the visualizer , specifically for putting the constraint node.role=manager.

Error:- 

[[email protected] ~]# docker service create --name=viz --publish=8080:8080/tcp --constraint==node.role=manager --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer

Error response from daemon: rpc error: code = Unknown desc = constraint expected one operator from ==, !=


Resolution:-

You need to use the == beween the node.role==manager in order to solve the problem so the command should be
[[email protected] ~]# docker service create --name=viz --publish=8080:8080/tcp --constraint==node.role==manager --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer

Error response from daemon: rpc error: code = Unknown desc = constraint expected one operator from ==, !=

[Solved] invalid argument "type=bind," for "--mount" flag: invalid field '' must be a key=value pair

Issue:- 

The issue occured while specifying docker to use the bindmount for the volume with following command

docker service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager --mount=type=bind, src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer

Error:- 

 invalid argument "type=bind," for "--mount" flag: invalid field '' must be a key=value pair

Resolution:-

The issue is coming because there is a space between the , and src in the above command. Make sure there is no space between mount=type=bind, src=/var/run/docker.sock like mount=type=bind,src=/var/run/docker.sock Than it should work fine.

docker service create --name=viz --publish=8080:8080/tcp --constraint=node.role==manager --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer

Monday, January 31, 2022

[ Solved ] http: server gave HTTP response to HTTPS client

Issue:- 

The issue occured if you are using the private repository provided by the Docker hub via container on a private server and pushing the image using the IP of the server directly. 

 Error:- 

[[email protected] ~]$ docker push 172.31.14.46:5000/dev
Using default tag: latest
The push refers to repository [172.31.14.46:5000/dev]
Get "https://172.31.14.46:5000/v2/": http: server gave HTTP response to HTTPS client
Due to the above error the image is not getting pushed into the Repository.