Issue:-
The Issue was first recognised after we expose a service using the Nodeport and tried to access over the localhost on other node followed by Nodeport number. But the curl request failed with timeout
On Inspecting the containers of the master node one of the containers of the calico node was not ready.
When I used the describe container for that service it gave the error mentioned below.
Error:-
Normal Created 14m kubelet Created container calico-node
Normal Started 14m kubelet Started container calico-node
Warning Unhealthy 14m (x3 over 14m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Warning Unhealthy 14m kubelet Readiness probe failed: 2022-02-13 06:15:58.211 [INFO][210] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.31.127.174,172.31.126.126
Warning Unhealthy 14m kubelet Readiness probe failed: 2022-02-13 06:16:08.297 [INFO][259] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.31.127.174,172.31.126.126
Warning Unhealthy 14m kubelet Readiness probe failed: 2022-02-13 06:16:18.333 [INFO][285] confd/health.go 180: Number of node(s) with BGP peering established = 0
Effect:-
So one of the calico container was failing and still it didn't seem to be an issue until we tried to expose the service and tried to access the service. After much troubleshooting and recreating , figured out that this is because one of the calico pod was down. This restricted the access of the nodeport service from the other ports. This issue was observed in the kubernetes version 1.23.3.
You would need to modify the calico.yaml file before applying the yaml to your kubernetes cluster.
Resolution:-
Following modifications are required to make the calico.yaml work with the latest version of the kubernetes and solve the container access issue across multiple nodes.
- Remove the current calico networking applied to your kubernetes cluster[[email protected] ~]$ kubectl delete -f https://docs.projectcalico.org/manifests/calico.yaml
- Download the calico.yaml on your machine dont apply it directly as[[email protected] ~]$ wget https://docs.projectcalico.org/manifests/calico.yaml
- Modify the calico.yaml by searching for autodetect- name: IPvalue: "autodetect"Add the following lines after it- name: IP_AUTODETECTION_METHODvalue: "interface=ens*"
So it will look like# Auto-detect the BGP IP address.- name: IPvalue: "autodetect"- name: IP_AUTODETECTION_METHODvalue: "interface=ens*" - Search for the v1beta1 and remove the beta1 from itSo when you search in the file you will get it likeapiVersion: policy/v1beta1kind: PodDisruptionBudget
Replace it asapiVersion: policy/v1kind: PodDisruptionBudget - You are done now and you can apply the configuration to your kubernetes cluster now[[email protected] ~]$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
- Verify the pods all the pods in kube-system should be ready now for the calico[[email protected] ~]$ kubectl get pods --all-namespacesNAMESPACE NAME READY STATUS RESTARTS AGEdefault pod1 0/1 ContainerCreating 0 13skube-system calico-kube-controllers-566dc76669-qvg6g 1/1 Running 0 5m14skube-system calico-node-6gxlq 1/1 Running 0 5m14skube-system calico-node-mfr9x 1/1 Running 0 5m15skube-system calico-node-w7fwv 1/1 Running 0 5m14skube-system coredns-64897985d-4bk2g 1/1 Running 1 (13h ago) 16hkube-system coredns-64897985d-wdd9l 1/1 Running 1 (13h ago) 16hkube-system etcd-kubemaster.unixcloudfusion.in 1/1 Running 1 (13h ago) 16hkube-system kube-apiserver-kubemaster.unixcloudfusion.in 1/1 Running 1 (13h ago) 16hkube-system kube-controller-manager-kubemaster.unixcloudfusion.in 1/1 Running 1 (13h ago) 16hkube-system kube-proxy-bp5kg 1/1 Running 1 (13h ago) 16hkube-system kube-proxy-kchq6 1/1 Running 1 (13h ago) 16hkube-system kube-proxy-rtk4q 1/1 Running 1 (13h ago) 16hkube-system kube-scheduler-kubemaster.unixcloudfusion.in 1/1 Running 1 (13h ago) 16h
- Download the calico.yaml on your machine dont apply it directly
[[email protected] ~]# docker service create --name=viz --publish=8080:8080/tcp --constraint==node.role==manager --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer
Error response from daemon: rpc error: code = Unknown desc = constraint expected one operator from ==, !=
Explanation:-
- The issue occurs because the calico was not able to identify the Ethernet card property , it was configured to detect the eth but on aws it was configured as ens so placing the regex helps it to identify the ethernet card and associated ip properly. Since ip is not correctly processed the pod network is not properly deployed due to which the communication issue occurs and fix by the above solution.
- Also the apiversion v1beta1 has been upgraded to the v1 now thats why it throws a WARNING for the same.
0 comments:
Post a Comment