Pages

Subscribe:

Wednesday, March 20, 2019

3 Creating pods running containers minikube

[Solved] Error restarting cluster: wait: waiting for k8s-app=kube-proxy: timed out waiting for the condition

Error:-
Error restarting cluster: wait: waiting for k8s-app=kube-proxy: timed out waiting for the condition

Solution:-
This occured during the minikube installation. To resolve this issue just delete the installation and start again that should resolve the issue
 ./minikube delete  
 ./minikube start
That should resolve this Error

[Solved] Unable to start VM: create: precreate: exec: "docker": executable file not found in $PATH

Error:-
Unable to start VM: create: precreate: exec: "docker": executable file not found in $PATH

Occurence:- 
Occured during the minikube installation

Resolution:-
Docker was not installed on the vm so installed the docker using the get.docker.com script as
 curl -fsSL https://get.docker.com/ | sh  
This should automatically detect the operating system and install the docker on your system.

[Solved] Unable to start VM: create: precreate: VBoxManage not found. Make sure VirtualBox is installed and VBoxManage is in the path

Error:-
Unable to start VM: create: precreate: VBoxManage not found. Make sure VirtualBox is installed and VBoxManage is in the path

Occurence:-
Following Error during the minikube installation on the virtualbox VM

Cause/Resolution:- 
Minikube and Vagrant vm dont work good simultaneously as its like running type2 virtualization over another type2 virtualization.
However it makes sense to run minikube on linux and if you running windows machine and want linux machine than you want to use virtualbox.

The solution is to disable the vm-driver of minikube to none as follows
 ./minikube config set vm-driver none  

That should solve your  problem

2 Minikube Installation

1 About Minikube and features

growpart fails to extend disk volume ( attempt to resize /dev/xvda failed. sfdisk output below )

Error:-

attempt to resize /dev/xvda failed. sfdisk output below:
|
| Disk /dev/xvda: 104433 cylinders, 255 heads, 63 sectors/track
| Old situation:
| Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
|
|    Device Boot Start     End   #cyls    #blocks   Id  System
| /dev/xvda1   *      1   78324   78324  629137530   83  Linux
| /dev/xvda2          0       -       0          0    0  Empty
| /dev/xvda3          0       -       0          0    0  Empty
| /dev/xvda4          0       -       0          0    0  Empty
| New situation:
| Units = sectors of 512 bytes, counting from 0
|
|    Device Boot    Start       End   #sectors  Id  System
| /dev/xvda1   *     16065 1677716144 1677700080  83  Linux
| /dev/xvda2             0         -          0   0  Empty
| /dev/xvda3             0         -          0   0  Empty
| /dev/xvda4             0         -          0   0  Empty
| Successfully wrote the new partition table
|
| Re-reading the partition table ...
| BLKRRPART: Device or resource busy
| The command to re-read the partition table failed.
| Run partprobe(8), kpartx(8) or reboot your system now,
| before using mkfs
| If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
| to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
| (See fdisk(8).)
FAILED: failed to resize
***** WARNING: Resize failed, attempting to revert ******
Re-reading the partition table ...
BLKRRPART: Device or resource busy
The command to re-read the partition table failed.
Run partprobe(8), kpartx(8) or reboot your system now,
before using mkfs
***** Appears to have gone OK ****

Resolution:-

# growpart /dev/xvda 1

If you are wondering you doing something wrong so there is absolutely nothing wrong with above command.

As you see there was no issue in creation of the new partition table it was successful. However I suspected that before completing it started to reread again due to which the disk was not increasing. I did tried multiple solutions and got some results for sfdisk however in my case the growpart was latest one only still the issue was coming.

At this point you will need to restart the server to fix this issue. If its the production server than you might have to take appropriate approvals as there is no other way after you restart the server the partition size should have increased at that time itself.

1. Rundeck Installation on Centos7.5



[Solved] invalid principal in policy

Problem:- I created a S3 policy same as the other policy which was above and when i saved the s3 policy it gave me the Invalid principal in policy and wont allow me to save the policy.


Cause:- I have given the wrong name of the arn due to which this issue was occurring, logically everything was correct. I believe AWS checked in backend that there was no such arn due to which it didn't allowed me to save the arn in first place.

Wrong ARN in my case:-
"AWS": "arn:aws:iam::446685876341:role/something-something-test-role"


Right ARN in my case:-
"AWS": "arn:aws:iam::446685876341:role/service-role/something-something-test-role"


Resolution:- Once i have resolved the above arn correctly so the error was resolved.

Monday, March 4, 2019

[Solved] url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [75/120s]: unexpected error ['NoneType' object has no attribute 'status_code']

Issue:- I was enabling the ENA support for the centos7.1 on the ec2 instance when i received following error
url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [75/120s]: unexpected error ['NoneType' object has no attribute 'status_code']

Due to which mynetwork card was not coming up for the instance and it was further resulting the instance-id failure due to which url_helper.py script of the AWS was failing to get the ip address. So when finally instance was booted as no ip was assigned to it the ssh checks known as instance checks were failing on the instance.

I was getting following logs which confirmed it

Cloud-init v. 0.7.5 running 'init' at Mon, 04 Mar 2018 06:33:38 +0000. Up 5.17 seconds.
cis-info: +++++++++++++++++++++++Net device info++++++++++++++++++++++++
cis-info: +--------+-------+-----------+-----------+-------------------+
cis-info: | Device |   Up  |  Address  |    Mask   |     Hw-Address    |
cis-info: +--------+-------+-----------+-----------+-------------------+
cis-info: | ens5:  | False |     .     |     .     | 06:f7:b8:fc:f1:20 |
cis-info: |  lo:   |  True | 127.0.0.1 | 255.0.0.0 |         .         |
cis-info: +--------+-------+-----------+-----------+-------------------+
cis-info: ++++++++++++++++++++++++++Route info+++++++++++++++++++++++++++
cis-info: +-------+-------------+---------+---------+-----------+-------+
cis-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
cis-info: +-------+-------------+---------+---------+-----------+-------+
cis-info: +-------+-------------+---------+---------+-----------+-------+
2018-03-03 22:33:38,836 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: unexpected error ['NoneType' object has no attribute 'status_code']


Cause:-
In the AWS Documentation it is mentioned to add GRUB_CMDLINE_LINUX=”net.ifnames=0” in the /boot/grub2/grub.cfg but for me.

Solution:-
I changed it and updated in the /etc/default/grub and recreated the grub.

After which the problem was resolved and I was successfully able to upgrade the instance to 5th generation support.

After the change i got the following output in the logs

Cloud-init v. 0.7.5 running 'init' at Mon, 04 Mar 2018 07:43:28 +0000. Up 8.73 seconds.
cis-info: ++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++
cis-info: +--------+------+-------------+---------------+-------------------+
cis-info: | Device |  Up  |   Address   |      Mask     |     Hw-Address    |
cis-info: +--------+------+-------------+---------------+-------------------+
cis-info: | ens5:  | True | 10.98.16.98 | 255.255.255.0 | 06:f7:b8:fc:f1:20 |
cis-info: |  lo:   | True |  127.0.0.1  |   255.0.0.0   |         .         |
cis-info: +--------+------+-------------+---------------+-------------------+
cis-info: +++++++++++++++++++++++++++++++Route info+++++++++++++++++++++++++++++++
cis-info: +-------+-------------+------------+---------------+-----------+-------+
cis-info: | Route | Destination |  Gateway   |    Genmask    | Interface | Flags |
cis-info: +-------+-------------+------------+---------------+-----------+-------+
cis-info: |   0   |   0.0.0.0   | 10.98.16.1 |    0.0.0.0    |    ens5   |   UG  |
cis-info: |   1   |  10.98.16.0 |  0.0.0.0   | 255.255.255.0 |    ens5   |   U   |
cis-info: +-------+-------------+------------+---------------+-----------+-------+
Cloud-init v. 0.7.5 running 'modules:config' at Mon, 04 Mar 2018 07:43:30 +0000. Up 10.16 seconds.

[Solved] /etc/default/grub: line 60: serial: command not found

Issue:- When i tried running the below command it resulted in the error
$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
/etc/default/grub: line 60: serial: command not found

Cause:- You at some point made some mistake and run grub2-mkconfig -o /etc/default/grub which has overwritten your default grub file and when you are trying to create a grub file as mentioned above its erroring out in your old grub file

Resolution:- Manually edit and copy the following content in the grub file
vi /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet"
GRUB_DISABLE_RECOVERY="true"

Friday, March 1, 2019

[Solved] Rate Limiting Errors in the Awscli

Error:- An error occurred (Throttling) when calling the DescribeLoadBalancers operation (reached max retries: 2): Rate exceeded
Error:- An error occurred (Throttling) when calling the GenerateCredentialReport operation (reached max retries: 4): Rate exceeded


Cause:- These types of Error occur when the rate limiting imposed by the AWS on its services crosses the threshold set by the AWS on its services. This can cause drop in your request due to which the automation scripts might not function or some of the request if run in batch is not completed which can further result in other issues.

Solution:-
1. Create models folder in your awscli path i.e. ~/.aws/models

mkdir ~/.aws/models

2. Create a retry with the following content inside the retry json file "~/.aws/models/_retry.json"

[Solved] Error: Driver 'pcspkr' is already registered, aborting

pcspkr is related to the pc speaker, so its safe to disable it, you can do it as follows

Solution:-
echo "blacklist pcspkr" > /etc/modprobe.d/blacklist-pcspkr.conf

Tuesday, February 19, 2019

Creating a your own hosted registry for the docker

1. Download the docker repository
wget https://download.docker.com/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker.repo

2. Install the docker-ce on the system as
yum install docker-ce -y

3. Create a directory as
mkdir /root/certs

4. Go to the website
sslforfree.com and generate the keys for your domain by manually verifying your domain and copy in the /root/certs directory

5. unzip the certs downloaded from sslforfree.zip
unzip sslforfree.zip
ls -ltr

-rw-r--r--. 1 centos centos 5599 Feb 19 11:11 sslforfree.zip
-rw-r--r--. 1 root   root   1703 Feb 19  2019 private.key
-rw-r--r--. 1 root   root   1922 Feb 19  2019 certificate.crt
-rw-r--r--. 1 root   root   1646 Feb 19  2019 ca_bundle.crt

6. Create the 2 directories as
[[email protected] certs]# mkdir -p /opt/registry/data
[[email protected] certs]# mkdir -p /var/lib/registry

7. Start and enable the docker service as
[[email protected] certs]# systemctl start docker
[[email protected] certs]# systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[[email protected] certs]#

6. Run your private repsository as
docker run -d -p 443:443 -v /root/certs:/certs -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/certificate.crt -e REGISTRY_HTTP_TLS_KEY=/certs/private.key -e REGISTRY_HTTP_ADDR=0.0.0.0:443 -v /opt/registry/data:/var/lib/registry --name registry registry:2

[[email protected] certs]# docker run -d -p 443:443 -v /root/certs:/certs -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/certificate.crt -e REGISTRY_HTTP_TLS_KEY=/certs/private.key -e REGISTRY_HTTP_ADDR=0.0.0.0:443 -v /opt/registry/data:/var/lib/registry --name registry registry:2
Unable to find image 'registry:2' locally
2: Pulling from library/registry
169185f82c45: Pull complete
046e2d030894: Pull complete
188836fddeeb: Pull complete
832744537747: Pull complete
7ceea07e80be: Pull complete
Digest: sha256:870474507964d8e7d8c3b53bcfa738e3356d2747a42adad26d0d81ef4479eb1b
Status: Downloaded newer image for registry:2
2f5bf3270abefe9e2bbdca51ae93b5dd5cc281837b62f24f0bc976a6801e2e41


7. Add the DNS record pointing to your server as
registry.test.unixcloudfusion.in IN A 52.39.129.41

8. We can test access to the registry using curl. The response should provide headers, for example Docker-Distribution-API-Version, indicating the request was processed by the Registry server.

[[email protected] certs]# curl -iv https://registry.unixcloudfusion.in/v2/
* About to connect() to registry.unixcloudfusion.in port 443 (#0)
*   Trying 52.39.129.41...
* Connected to registry.unixcloudfusion.in (52.39.129.41) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: ca_bundle.crt
  CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=*.unixcloudfusion.in
* start date: Feb 19 09:18:56 2019 GMT
* expire date: May 20 09:18:56 2019 GMT
* common name: *.unixcloudfusion.in
* issuer: CN=Let's Encrypt Authority X3,O=Let's Encrypt,C=US
> GET /v2/ HTTP/1.1
> User-Agent: curl/7.29.0
> Host: registry.unixcloudfusion.in
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Content-Length: 2
Content-Length: 2
< Content-Type: application/json; charset=utf-8
Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< Date: Tue, 19 Feb 2019 16:31:33 GMT
Date: Tue, 19 Feb 2019 16:31:33 GMT

9. Download the image from the dockerhub, add the tags to identify it belongs to your repository
[[email protected] certs]# docker pull alpine:latest;docker tag alpine:latest registry.unixcloudfusion.in/alpine:alpinelocalv1
latest: Pulling from library/alpine
6c40cc604d8e: Pull complete
Digest: sha256:b3dbf31b77fd99d9c08f780ce6f5282aba076d70a513a8be859d8d3a4d0c92b8
Status: Downloaded newer image for alpine:latest

10. Verify the docker image as
[[email protected] certs]# docker images
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
registry                             2                   d0eed8dad114        2 weeks ago         25.8MB
alpine                               latest              caf27325b298        2 weeks ago         5.53MB
registry.unixcloudfusion.in/alpine   alpinelocalv1       caf27325b298        2 weeks ago         5.53MB

11. Push the image to your own repository
[[email protected] certs]# docker push registry.unixcloudfusion.in/alpine:alpinelocalv1
The push refers to repository [registry.unixcloudfusion.in/alpine]
503e53e365f3: Pushed
alpinelocalv1: digest: sha256:25b4d910f4b76a63a3b45d0f69a57c34157500faf6087236581eca221c62d214 size: 528


[Solved] x509: certificate signed by unknown authority

This error can occur if docker is not able to verify your certificate provider which might be due to the issue of bundle certificates used to verify the Certificate authority in absence of which you might be getting this error.

There is a workaround for this in which case it will ignore the certificate validation.

Create a file as /etc/docker/daemon.json
touch /etc/docker/daemon.json

Enter the following content in the daemon.json file replacing the endpoint for your repository as
[[email protected] certs]# cat /etc/docker/daemon.json
{
    "insecure-registries" : [ "registry.unixcloudfusion.in" ]
}

Go ahead and restart your docker service as
systemctl restart docker

Than try to push again to the repository this time you shouldn't get an error message.



Friday, February 15, 2019

[Solved] error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)

I got this error while running

kubectl exec busybox-744d79879-q4bvl -- /bin/sh

which resulted in

error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)


Cause/Resolution:-
Your kubernetes apiserver is using a client certificate with CN=kubernetes to connect to the kubelets and that user is not currently authorized to use the kubelet API.

By default system:kubelet-api-admin cluster role defines the permissions required to access that API. You can grant that permission to your apiserver kubelet client user with

kubectl create clusterrolebinding apiserver-kubelet-api-admin --clusterrole system:kubelet-api-admin --user kubernetes

Prometheus Monitoring for Microservices



1. Coming to the age of the microservices the older monitoring systems are not much dependable especially when you have a dynamic environment where containers keep coming up and down.

2. Prometheus is an open-source monitoring and alerting system built at soundcloud in 2012 and now managed by Cloud native computing foundation in 2016 as the second hosted project after Kubernetes.

3. Prometheus main featues include a multi-dimensional data model with time series data identified by metric name and key/value pairs which helps in understand overall performance of the sytem graphically.

4. Prometheus support PromoQL, a flexible query language to leverage this dimensionality.

5. It's not reliant on distributed storage like zookeeper rather single server nodes are autonomous.

6. Time series collection happens via pull model over http and pushing is supported via an intermediary gateway.

7. Targets for the monitoring are discovered via service discovery or static configuration which allows you to dynamically configure monitoring in a dynamic environment.

8. The main components of the prometheus is prometheus server which scrapes and stores time series data, client libraries for instrumenting application code, push gateway for supporting short-lived jobs, exporters like HAProxy, StatsD, Graphite etc, an alertmanager to handle alerts and various support tools.

9. Most of the prometheus components are written in Go programming language, making them easy to build and deploy as static libraries.

10. Prometheus works well with the purely numberic timer series metric. It fits both the machine centric monitoring as well as monitoring of highly dynamic service-oriented architectures. From microservices point of view it supports multi-dimensional data collection and querying is a particular strength.

In our future posts we are going to compare the prometheus with other monitoring tools.

Creating Docker Private Registry from scratch nonproduction only

Consider the following diagram to understand how the container calls the images in the dockerhub initially and how we can replace the dockerhub with our own local registry to store our docker images which will only be available in our own network , thus making it more secure

For a detailed walkthrough on how you can create your own private docker registry, go through the following video in which we have demonstrated how you create your own private docker registry in the nonproduction environment.

Wednesday, February 13, 2019

What is Service Mesh ?

As the introduction of the distributed microservices architecture for creating web/mobile based applications has increased and the orchestration tools such as kubernetes, public clouds has increased and made it more convenient to facilitate these microservice based architecture so the next demand is towards the deployment of the service mesh.

The term service mesh is used to describe the network of microservices that make up the applications running in an environment and how they are interacting amongst themselves. As the environment grows so the is the size of the services and there complexity to communicate both synchronously and asynchronously due to which it becomes harder and challenging to understand and manage such environments.

Than the requirements such as service discovery, load balancing, failure recovery, metrices and continuous monitoring often combines the requirement for more complex operational requirements like A/B testing, canary releases, rate limiting, access control and end-to-end authentication for the various api's and services.

The service mesh provides behavioural insights and operational control over the service mesh as a whole by offering a complete solution to satisfy the diverse requirements for managing the microservice applications.

Some of the leading service mesh provider include Istio developed in collaboration between Lyft, IBM , Google, Vmware and RedHat. Alternatives to Istion include Linkerd, the first service mesh to be ever developed created by Bouyant which open source service mesh written in scale and can be deployed on multiple types of clusters. Than there is Consul developed by Hashicorp which runs on agent-based model i.e. Consul client and finally than there is AWS App Mesh which is specifically developed for the AWS Public cloud.

We will be covering them in more detail in our future posts.

Tuesday, February 12, 2019

[Solved] S3 Bucket action doesn't apply to any resources

This error occurred when i tried implementing the s3 bucket policy.

this is due to the following policy which i was implementing

            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket",
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::bucketname"
            ]

The issue here is , I was trying to implement it on the bucket only when the action has to applied in the form of regex to all the objects under the bucket so i replaced it with

            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket",
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::bucketname",
                "arn:aws:s3:::bucketname/*"
            ]

That resolved my issue.

Monday, February 4, 2019

[Solved] Unable to create a new revision of Task Definition prod-not******:2 Docker label key owner contains invalid characters, does not match pattern ^[_\-a-zA-Z0-9.]+$

If you are getting the below error while updating the AWS ECS service

Unable to create a new revision of Task Definition prod-not****:2
Docker label key owner contains invalid characters, does not match pattern ^[_\-a-zA-Z0-9.]+$

Solution:-
In my case although the key value for the docker label appears to be correct there was an extra space in the key towards the end due to which i was not able to update key value and since it was not matching the regex which AWS has implemented on its end to verify the content , the ecs service was not allowing to update the configuration.

So check you don't have extra spaces and your labels are matching the regex which AWS Ecs service allows.