Issue:-
The Issue was first recognised after we expose a service using the Nodeport and tried to access over the localhost on other node followed by Nodeport number. But the curl request failed with timeout
On Inspecting the containers of the master node one of the containers of the calico node was not ready.
When I used the describe container for that service it gave the error mentioned below.
Error:-
Normal Created 14m kubelet Created container calico-node
Normal Started 14m kubelet Started container calico-node
Warning Unhealthy 14m (x3 over 14m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Warning Unhealthy 14m kubelet Readiness probe failed: 2022-02-13 06:15:58.211 [INFO][210] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.31.127.174,172.31.126.126
Warning Unhealthy 14m kubelet Readiness probe failed: 2022-02-13 06:16:08.297 [INFO][259] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.31.127.174,172.31.126.126
Warning Unhealthy 14m kubelet Readiness probe failed: 2022-02-13 06:16:18.333 [INFO][285] confd/health.go 180: Number of node(s) with BGP peering established = 0
Effect:-
So one of the calico container was failing and still it didn't seem to be an issue until we tried to expose the service and tried to access the service. After much troubleshooting and recreating , figured out that this is because one of the calico pod was down. This restricted the access of the nodeport service from the other ports. This issue was observed in the kubernetes version 1.23.3.
You would need to modify the calico.yaml file before applying the yaml to your kubernetes cluster.
Resolution:-
Following modifications are required to make the calico.yaml work with the latest version of the kubernetes and solve the container access issue across multiple nodes.
- Remove the current calico networking applied to your kubernetes cluster[centos@kubemaster ~]$ kubectl delete -f https://docs.projectcalico.org/manifests/calico.yaml
- Download the calico.yaml on your machine dont apply it directly as[centos@kubemaster ~]$ wget https://docs.projectcalico.org/manifests/calico.yaml
- Modify the calico.yaml by searching for autodetect- name: IPvalue: "autodetect"Add the following lines after it- name: IP_AUTODETECTION_METHODvalue: "interface=ens*"
So it will look like# Auto-detect the BGP IP address.- name: IPvalue: "autodetect"- name: IP_AUTODETECTION_METHODvalue: "interface=ens*" - Search for the v1beta1 and remove the beta1 from itSo when you search in the file you will get it likeapiVersion: policy/v1beta1kind: PodDisruptionBudget
Replace it asapiVersion: policy/v1kind: PodDisruptionBudget - You are done now and you can apply the configuration to your kubernetes cluster now[centos@kubemaster ~]$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
- Verify the pods all the pods in kube-system should be ready now for the calico[centos@kubemaster ~]$ kubectl get pods --all-namespacesNAMESPACE NAME READY STATUS RESTARTS AGEdefault pod1 0/1 ContainerCreating 0 13skube-system calico-kube-controllers-566dc76669-qvg6g 1/1 Running 0 5m14skube-system calico-node-6gxlq 1/1 Running 0 5m14skube-system calico-node-mfr9x 1/1 Running 0 5m15skube-system calico-node-w7fwv 1/1 Running 0 5m14skube-system coredns-64897985d-4bk2g 1/1 Running 1 (13h ago) 16hkube-system coredns-64897985d-wdd9l 1/1 Running 1 (13h ago) 16hkube-system etcd-kubemaster.unixcloudfusion.in 1/1 Running 1 (13h ago) 16hkube-system kube-apiserver-kubemaster.unixcloudfusion.in 1/1 Running 1 (13h ago) 16hkube-system kube-controller-manager-kubemaster.unixcloudfusion.in 1/1 Running 1 (13h ago) 16hkube-system kube-proxy-bp5kg 1/1 Running 1 (13h ago) 16hkube-system kube-proxy-kchq6 1/1 Running 1 (13h ago) 16hkube-system kube-proxy-rtk4q 1/1 Running 1 (13h ago) 16hkube-system kube-scheduler-kubemaster.unixcloudfusion.in 1/1 Running 1 (13h ago) 16h
- Download the calico.yaml on your machine dont apply it directly
[root@managernode ~]# docker service create --name=viz --publish=8080:8080/tcp --constraint==node.role==manager --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer
Error response from daemon: rpc error: code = Unknown desc = constraint expected one operator from ==, !=
Explanation:-
- The issue occurs because the calico was not able to identify the Ethernet card property , it was configured to detect the eth but on aws it was configured as ens so placing the regex helps it to identify the ethernet card and associated ip properly. Since ip is not correctly processed the pod network is not properly deployed due to which the communication issue occurs and fix by the above solution.
- Also the apiversion v1beta1 has been upgraded to the v1 now thats why it throws a WARNING for the same.
Hi Ankit,
ReplyDeleteI am getting issue with calico controller pod which is little similar to above issue. i followed the above steps and still i get below error.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m22s default-scheduler Successfully assigned kube-system/calico-kube-controllers-b867fc97d-j9g6k to master-node
Warning FailedCreatePodSandBox 4m22s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8d7326268a628853472480ebcb55a2cbe3af784ba3cc71db4309b1097b19dd6d": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 4m7s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e6371cc37406e710f0f5d1d5bf97e71e0c19b3ab3b03ce0439d5d0d537bd1573": plugin type="calico" failed (add): error adding host side routes for interface: caliab4f0766ade, error: route (Ifindex: 34, Dst: 192.168.9.193/32, Scope: link) already exists for an interface other than 'caliab4f0766ade': route (Ifindex: 17, Dst: 192.168.9.193/32, Scope: link, Iface: cali8b91c69d0f9)
Warning FailedCreatePodSandBox 4m5s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "6e780eb18667d4427031cb47f7354c6ee51c2eb69d0eb6f1188bac3bd84a6f73": plugin type="calico" failed (add): error adding host side routes for interface: caliab4f0766ade, error: route (Ifindex: 35, Dst: 192.168.9.194/32, Scope: link) already exists for an interface other than 'caliab4f0766ade': route (Ifindex: 18, Dst: 192.168.9.194/32, Scope: link, Iface: calid06878f2b24)
Normal SandboxChanged 3m52s (x4 over 4m21s) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Started 3m51s kubelet Started container calico-kube-controllers
Warning Unhealthy 3m22s (x5 over 3m51s) kubelet Readiness probe failed: initialized to false
Warning Unhealthy 3m22s (x2 over 3m32s) kubelet Liveness probe failed: initialized to false
Warning Unhealthy 2m52s (x3 over 3m12s) kubelet Liveness probe failed: Error initializing datastore: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.96.0.1:443: i/o timeout
Warning Unhealthy 2m52s (x3 over 3m12s) kubelet Readiness probe failed: Error initializing datastore: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.96.0.1:443: i/o timeout
Normal Pulled 2m51s (x2 over 3m52s) kubelet Container image "docker.io/calico/kube-controllers:v3.27.3" already present on machine
Normal Created 2m51s (x2 over 3m52s) kubelet Created container calico-kube-controllers
any help would be greatly appreciated. thank you in advance.
Hi Ravi,
DeleteThis usually occurred in the older version v1.22.3 make sure you are using latest version. You can also try to delete all the pods of the calico and when they recreated you might not see the problem. This is the workaround for this issue for now
Thanks
Hi Ankit, thanks for the reply. sorry for not providing the version details. I am using Kubernetes 1.27 and calico latest version 3.27.3. Not modified any other parameters in default YAML except mentioned things in your article. Cluster I am setting up in RHEL 8.9. when I run node status I see below message.
Delete$ calicoctl node status
Calico process is running.
IPv4 BGP status
No IPv4 peers found.
IPv6 BGP status
No IPv6 peers found.
any thoughts why IPv4 peers empty?