-->

Sunday, February 13, 2022

[Solved] calico/node is not ready: BIRD is not ready: BGP not established (Calico 3.6 / k8s 1.14.1)

 

Issue:- 

The Issue was first recognised after we expose a service using the Nodeport and tried to access over the localhost on other node followed by Nodeport number. But the curl request failed with timeout

On Inspecting the containers of the master node one of the containers of the calico node was not ready.

When I used the describe container for that service it gave the error mentioned below.

Error:- 

  Normal   Created    14m                kubelet            Created container calico-node
  Normal   Started    14m                kubelet            Started container calico-node
  Warning  Unhealthy  14m (x3 over 14m)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused

  Warning  Unhealthy  14m                kubelet            Readiness probe failed: 2022-02-13 06:15:58.211 [INFO][210] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.31.127.174,172.31.126.126

  Warning  Unhealthy  14m  kubelet  Readiness probe failed: 2022-02-13 06:16:08.297 [INFO][259] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.31.127.174,172.31.126.126

  Warning  Unhealthy  14m  kubelet  Readiness probe failed: 2022-02-13 06:16:18.333 [INFO][285] confd/health.go 180: Number of node(s) with BGP peering established = 0

Effect:-

So one of the calico container was failing and still it didn't seem to be an issue until we tried to expose the service and tried to access the service. After much troubleshooting and recreating , figured out that this is because one of the calico pod was down. This restricted the access of the nodeport service from the other ports. This issue was observed in the kubernetes version 1.23.3.

You would need to modify the calico.yaml file before applying the yaml to your kubernetes cluster.

Resolution:-

Following modifications are required to make the calico.yaml work with the latest version of the kubernetes and solve the container access issue across multiple nodes.

  1. Remove the current calico networking applied to your kubernetes cluster

    [centos@kubemaster ~]$ kubectl delete -f https://docs.projectcalico.org/manifests/calico.yaml
  2. Download the calico.yaml on your machine dont apply it directly as

    [centos@kubemaster ~]$ wget https://docs.projectcalico.org/manifests/calico.yaml
  3. Modify the calico.yaml by searching for autodetect
    - name: IP
      value: "autodetect"

    Add the following lines after it

    - name: IP_AUTODETECTION_METHOD
      value: "interface=ens*"

    So it will look like

     # Auto-detect the BGP IP address.
     - name: IP
        value: "autodetect"
     - name: IP_AUTODETECTION_METHOD
        value: "interface=ens*"

  4. Search for the v1beta1 and remove the beta1 from it
    So when you search in the file you will get it like 

    apiVersion: policy/v1beta1
    kind: PodDisruptionBudget

    Replace it as

    apiVersion: policy/v1
    kind: PodDisruptionBudget

  5. You are done now and you can apply the configuration to your kubernetes cluster now

    [centos@kubemaster ~]$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
  6. Verify the pods all the pods in kube-system should be ready now for the calico

    [centos@kubemaster ~]$ kubectl get pods --all-namespaces
    NAMESPACE     NAME                                                    READY   STATUS              RESTARTS      AGE
    default       pod1                                                    0/1     ContainerCreating   0             13s
    kube-system   calico-kube-controllers-566dc76669-qvg6g                1/1     Running             0             5m14s
    kube-system   calico-node-6gxlq                                       1/1     Running             0             5m14s
    kube-system   calico-node-mfr9x                                       1/1     Running             0             5m15s
    kube-system   calico-node-w7fwv                                       1/1     Running             0             5m14s
    kube-system   coredns-64897985d-4bk2g                                 1/1     Running             1 (13h ago)   16h
    kube-system   coredns-64897985d-wdd9l                                 1/1     Running             1 (13h ago)   16h
    kube-system   etcd-kubemaster.unixcloudfusion.in                      1/1     Running             1 (13h ago)   16h
    kube-system   kube-apiserver-kubemaster.unixcloudfusion.in            1/1     Running             1 (13h ago)   16h
    kube-system   kube-controller-manager-kubemaster.unixcloudfusion.in   1/1     Running             1 (13h ago)   16h
    kube-system   kube-proxy-bp5kg                                        1/1     Running             1 (13h ago)   16h
    kube-system   kube-proxy-kchq6                                        1/1     Running             1 (13h ago)   16h
    kube-system   kube-proxy-rtk4q                                        1/1     Running             1 (13h ago)   16h
    kube-system   kube-scheduler-kubemaster.unixcloudfusion.in            1/1     Running             1 (13h ago)   16h
  7. Download the calico.yaml on your machine dont apply it directly
[root@managernode ~]# docker service create --name=viz --publish=8080:8080/tcp --constraint==node.role==manager --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer

Error response from daemon: rpc error: code = Unknown desc = constraint expected one operator from ==, !=

Explanation:-

  1. The issue occurs because the calico was not able to identify the Ethernet card property , it was configured to detect the eth but on aws it was configured as ens so placing the regex helps it to identify the ethernet card and associated ip properly. Since ip is not correctly processed the pod network is not properly deployed due to which the communication issue occurs and fix by the above solution.
  2. Also the apiversion v1beta1 has been upgraded to the v1 now thats why it throws a WARNING for the same.

3 comments:

  1. Hi Ankit,

    I am getting issue with calico controller pod which is little similar to above issue. i followed the above steps and still i get below error.

    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal Scheduled 4m22s default-scheduler Successfully assigned kube-system/calico-kube-controllers-b867fc97d-j9g6k to master-node
    Warning FailedCreatePodSandBox 4m22s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8d7326268a628853472480ebcb55a2cbe3af784ba3cc71db4309b1097b19dd6d": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
    Warning FailedCreatePodSandBox 4m7s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e6371cc37406e710f0f5d1d5bf97e71e0c19b3ab3b03ce0439d5d0d537bd1573": plugin type="calico" failed (add): error adding host side routes for interface: caliab4f0766ade, error: route (Ifindex: 34, Dst: 192.168.9.193/32, Scope: link) already exists for an interface other than 'caliab4f0766ade': route (Ifindex: 17, Dst: 192.168.9.193/32, Scope: link, Iface: cali8b91c69d0f9)
    Warning FailedCreatePodSandBox 4m5s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "6e780eb18667d4427031cb47f7354c6ee51c2eb69d0eb6f1188bac3bd84a6f73": plugin type="calico" failed (add): error adding host side routes for interface: caliab4f0766ade, error: route (Ifindex: 35, Dst: 192.168.9.194/32, Scope: link) already exists for an interface other than 'caliab4f0766ade': route (Ifindex: 18, Dst: 192.168.9.194/32, Scope: link, Iface: calid06878f2b24)
    Normal SandboxChanged 3m52s (x4 over 4m21s) kubelet Pod sandbox changed, it will be killed and re-created.
    Normal Started 3m51s kubelet Started container calico-kube-controllers
    Warning Unhealthy 3m22s (x5 over 3m51s) kubelet Readiness probe failed: initialized to false
    Warning Unhealthy 3m22s (x2 over 3m32s) kubelet Liveness probe failed: initialized to false
    Warning Unhealthy 2m52s (x3 over 3m12s) kubelet Liveness probe failed: Error initializing datastore: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.96.0.1:443: i/o timeout
    Warning Unhealthy 2m52s (x3 over 3m12s) kubelet Readiness probe failed: Error initializing datastore: Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": dial tcp 10.96.0.1:443: i/o timeout
    Normal Pulled 2m51s (x2 over 3m52s) kubelet Container image "docker.io/calico/kube-controllers:v3.27.3" already present on machine
    Normal Created 2m51s (x2 over 3m52s) kubelet Created container calico-kube-controllers

    any help would be greatly appreciated. thank you in advance.

    ReplyDelete
    Replies
    1. Hi Ravi,

      This usually occurred in the older version v1.22.3 make sure you are using latest version. You can also try to delete all the pods of the calico and when they recreated you might not see the problem. This is the workaround for this issue for now

      Thanks

      Delete
    2. Hi Ankit, thanks for the reply. sorry for not providing the version details. I am using Kubernetes 1.27 and calico latest version 3.27.3. Not modified any other parameters in default YAML except mentioned things in your article. Cluster I am setting up in RHEL 8.9. when I run node status I see below message.

      $ calicoctl node status

      Calico process is running.

      IPv4 BGP status
      No IPv4 peers found.

      IPv6 BGP status
      No IPv6 peers found.

      any thoughts why IPv4 peers empty?

      Delete