-->

Sunday, February 13, 2022

[Solved] calico/node is not ready: BIRD is not ready: BGP not established (Calico 3.6 / k8s 1.14.1)

 

Issue:- 

The Issue was first recognised after we expose a service using the Nodeport and tried to access over the localhost on other node followed by Nodeport number. But the curl request failed with timeout

On Inspecting the containers of the master node one of the containers of the calico node was not ready.

When I used the describe container for that service it gave the error mentioned below.

Error:- 

  Normal   Created    14m                kubelet            Created container calico-node
  Normal   Started    14m                kubelet            Started container calico-node
  Warning  Unhealthy  14m (x3 over 14m)  kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused

  Warning  Unhealthy  14m                kubelet            Readiness probe failed: 2022-02-13 06:15:58.211 [INFO][210] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.31.127.174,172.31.126.126

  Warning  Unhealthy  14m  kubelet  Readiness probe failed: 2022-02-13 06:16:08.297 [INFO][259] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.31.127.174,172.31.126.126

  Warning  Unhealthy  14m  kubelet  Readiness probe failed: 2022-02-13 06:16:18.333 [INFO][285] confd/health.go 180: Number of node(s) with BGP peering established = 0

Effect:-

So one of the calico container was failing and still it didn't seem to be an issue until we tried to expose the service and tried to access the service. After much troubleshooting and recreating , figured out that this is because one of the calico pod was down. This restricted the access of the nodeport service from the other ports. This issue was observed in the kubernetes version 1.23.3.

You would need to modify the calico.yaml file before applying the yaml to your kubernetes cluster.

Resolution:-

Following modifications are required to make the calico.yaml work with the latest version of the kubernetes and solve the container access issue across multiple nodes.

  1. Remove the current calico networking applied to your kubernetes cluster

    [centos@kubemaster ~]$ kubectl delete -f https://docs.projectcalico.org/manifests/calico.yaml
  2. Download the calico.yaml on your machine dont apply it directly as

    [centos@kubemaster ~]$ wget https://docs.projectcalico.org/manifests/calico.yaml
  3. Modify the calico.yaml by searching for autodetect
    - name: IP
      value: "autodetect"

    Add the following lines after it

    - name: IP_AUTODETECTION_METHOD
      value: "interface=ens*"

    So it will look like

     # Auto-detect the BGP IP address.
     - name: IP
        value: "autodetect"
     - name: IP_AUTODETECTION_METHOD
        value: "interface=ens*"

  4. Search for the v1beta1 and remove the beta1 from it
    So when you search in the file you will get it like 

    apiVersion: policy/v1beta1
    kind: PodDisruptionBudget

    Replace it as

    apiVersion: policy/v1
    kind: PodDisruptionBudget

  5. You are done now and you can apply the configuration to your kubernetes cluster now

    [centos@kubemaster ~]$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
  6. Verify the pods all the pods in kube-system should be ready now for the calico

    [centos@kubemaster ~]$ kubectl get pods --all-namespaces
    NAMESPACE     NAME                                                    READY   STATUS              RESTARTS      AGE
    default       pod1                                                    0/1     ContainerCreating   0             13s
    kube-system   calico-kube-controllers-566dc76669-qvg6g                1/1     Running             0             5m14s
    kube-system   calico-node-6gxlq                                       1/1     Running             0             5m14s
    kube-system   calico-node-mfr9x                                       1/1     Running             0             5m15s
    kube-system   calico-node-w7fwv                                       1/1     Running             0             5m14s
    kube-system   coredns-64897985d-4bk2g                                 1/1     Running             1 (13h ago)   16h
    kube-system   coredns-64897985d-wdd9l                                 1/1     Running             1 (13h ago)   16h
    kube-system   etcd-kubemaster.unixcloudfusion.in                      1/1     Running             1 (13h ago)   16h
    kube-system   kube-apiserver-kubemaster.unixcloudfusion.in            1/1     Running             1 (13h ago)   16h
    kube-system   kube-controller-manager-kubemaster.unixcloudfusion.in   1/1     Running             1 (13h ago)   16h
    kube-system   kube-proxy-bp5kg                                        1/1     Running             1 (13h ago)   16h
    kube-system   kube-proxy-kchq6                                        1/1     Running             1 (13h ago)   16h
    kube-system   kube-proxy-rtk4q                                        1/1     Running             1 (13h ago)   16h
    kube-system   kube-scheduler-kubemaster.unixcloudfusion.in            1/1     Running             1 (13h ago)   16h
  7. Download the calico.yaml on your machine dont apply it directly
[root@managernode ~]# docker service create --name=viz --publish=8080:8080/tcp --constraint==node.role==manager --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock dockersamples/visualizer

Error response from daemon: rpc error: code = Unknown desc = constraint expected one operator from ==, !=

Explanation:-

  1. The issue occurs because the calico was not able to identify the Ethernet card property , it was configured to detect the eth but on aws it was configured as ens so placing the regex helps it to identify the ethernet card and associated ip properly. Since ip is not correctly processed the pod network is not properly deployed due to which the communication issue occurs and fix by the above solution.
  2. Also the apiversion v1beta1 has been upgraded to the v1 now thats why it throws a WARNING for the same.

0 comments:

Post a Comment