Wednesday, October 16, 2019

Kubernetes Important Commands And Concepts

1. Listing all the resources in the kubernetes you can use
kubectl api-resources -o name (which lists the resources according to the name only)

2. Spec
Every object in kubernetes has a specification provided by the user which defines the state for that object to be in.

3. Status
Status represents the current actual state of the object. Kubernetes matches the spec to achieve the desired state specified in the spec

4. kubectl get:- to get the list of objects in kubernetes. For e.g kubectl get pods -n kube-system , kubectl get nodes
you can get more detailed information about a object like
kubectl get nodes kube-node1 -o yaml (yaml representation of this object)

5. kubectl describe kube-node1 (Readable overview about an object but not the yaml format)

6. Pods can contain one or more containers and a set of resources shared by those containers. All containers in kubernetes are part of a pod.

Example of pod Yaml https://github.com/ankit630/IAC/blob/master/kubernetes/pods/ex-pod.yml

7. kubectl create -f ex-pod.yml (Its going to create the pod in the kubernetes cluster)

8. kubectl apply -f ex-pod.yml (Any changes like change in existing container can be applied to existing container)

9. kubectl edit pod ex-pod (Apart from apply edit can also be used to edit pod and saving file will autoamtically apply changes)

10. kubectl delete pod ex-pod (Used to delete the existing pod)

11. Namespace allows to organize the objects in cluster with every object belonging to a namespace and when no namespace is defined it automatically goes to default namespace.

12. kubectl get namespaces (list the namespaces in cluster)

13. kubectl create ns ex-ns (creates the ex-ns namespace in kubernetes)

14. kubectl get pods -n ex-ns (list pods in example namespace)

Part 2 Using Athena to query S3 buckets

This is in continuation to my previous post on how can use the Athena to query the s3 buckets storing the cloudtrail logs in order to better organize your security and compliance which is hard thing to achieve in a legacy/large accounts with number of users.

Question:- Identifying the last 100 most used IAM Keys. Usually IAM roles is better approach to be used than using the IAM keys for the authentication as IAM roles can rotate the keys after every 15minutes thus making hard to intercept the keys and increasing the security of the Account.




  COUNT(eventname) as frequency

 FROM account_cloudtrail

 WHERE sourceipaddress NOT LIKE '%.com'

   AND year = '2019'

   AND month = '01'

   AND day = '01'

   AND useridentity.accesskeyid LIKE 'AKIA%'

 GROUP BY useridentity.accesskeyid, useridentity.arn, eventname

 ORDER BY frequency DESC

 LIMIT 100 

Friday, October 11, 2019

Part 1 Using Athena to query S3 buckets

While its great to push all the logs data gathered from various sources like your load balancers, cloudtrail, application logs etc to the S3 buckets. But as your infrastructure grows in size it becomes difficult to analyze such huge amount of data of months or year.

You can use the Athena service of the Amazon AWS to query the S3 service data without the need of downloading and processing it manually. This saves the requirement of extra processing, space requirement etc. We are going to cover the query details of most of the effective queries which can help you analyze and meaningful information from your s3 logs data.

 Question:- Identifying all the users,events,accounts accessing a particular s3 bucket  





    json_extract_scalar(requestparameters, '$.bucketName') AS bucketName,


 FROM unixcloudfusion_cloudtrail

 WHERE year = '2019'

  AND month = '10'

  AND day = '09'

  AND eventsource = 's3.amazonaws.com'

  AND json_extract_scalar(requestparameters, '$.bucketName') = 'unixcloudfusion.analytics' 

Thursday, October 10, 2019

Command Logging & Kibana Plotting

Problem Statement : Monitor & track all the activities/commands used by user on system

Minimum Requirement(s):   1) Required separate ELK cluster for command logging
                                               2) Required Snoopy Logger Agent on all client machines.
                                               3) Required File beat agent.

Context: In order to track what all commands are being fired by users , we''ll be needing bash_history of that specific user it becomes tedious task when we have to track specific user (or multiple user)in different machines

Solution:  Snoopy Logger is a tiny library that logs all executed commands (+ arguments) on your system.

Below is the link for more information on snoopy which includes installing snoopy logger as well.

  Through Snoopy logger we will be getting one single file for all command hit by any user ,you can specify message format  and filter chain for filtering  logs in snoopy based on message format we need to create grok in logstash , we can also exclude some repetitive internal command by drop filter in logstash format for excluding command is given below :

filter {
 if [command] == "command-name" {
   drop {
      percentage => 100

Wednesday, October 9, 2019

[Solved] CannotPullContainerError: no space left on device

By default ECS service of AWS doesn't take care of free disk space on ECS instances while putting new tasks. It uses only CPU and Memory resources for a task placement. In case of disk overfilling, ECS is trying to start new task anyways, but it fails because of error “CannotPullContainerError: no space left on device”. Overfilled instances stay active in cluster until regular cluster roll replaces all instances.

The correct way for handling task placement is by letting ECS know about free disk space (set custom attribute) and set placement constrant for a task definition. (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-placement-constraints.html). Once we have custom attribute that indicates disk usage, we can configure task definition to not place task if used disk space greater than configured threshold.

This can be achieved by included the shell script for monitoring free space and deregistering an instance. The script needs to be run through a system cron every 5minutes. This script gets disk usage from 'df' command output and set ECS instance attribute 'SpaceUsedPercent', if used disk space greater treshold (85%), script sets ECS instance status to draining, and when running container count drops to 3 or less, script deregisters container instance from ECS cluster and update CloudWatch metric 'deregisteredLowSpaceInstances'. When an instance inactive in cluster more than 10 mins Spotinst terminates it.

If you are not using the Spotinst in that case you will need to further put a logic to terminate and create a new instance using the AWS CLI and add that to the running cluster. If you are using Spotinst to run your containers than spotinst can take care of this for you.

Create new revision of task definition, scroll down to 'Constraint', add new.

Now ECS will consider this additional constraint as well while placing the tasks.


Tuesday, August 20, 2019

Python3 installation on Centos7

If you are installing the python3 on centos7 than it comes by default shipped with python 2.7.5 which is required by the core os of the centos. Now if you try to install through yum than you can't directly install the python3 instead you would require to install the  software collections first.

Software collections is an community project which allows you to build, install and use the multiple version of software on same system without affecting default system packages. It will enable the multiple version of the programming languages which are not available in the core repositories.

To install scl run the following command
# yum install centos-release-scl

To install the python3 run
# yum install rh-python36

You will also need to install the 'Development tools' which are required for building python modules
# yum groupinstall 'Development Tools'

Python virtual environments allows you to install the python modules in isolated location for a specific project, rather than being installed globally. This way it does not affect other python projects.

You can use venv to create the virtual environment in python

Create a new project folder where all project related files and binaries will reside

#mkdir project

# cd project

Next you have to enable the python36 with the scl first before you can use it do
#scl enable rh-python36 bash

You can create the virtual environment
python -m venv my_project_venv

Execute the following command to enable the virtual environment
source my_project_venv/bin/activate

You should get a cursor as below now
$ (my_project_venv) user@host:~/my_new_project$

The prefix indicates that the Python virtual environment my_project_venv is currently active

Now your virtual environment is ready to use.

Wednesday, August 7, 2019

Main Advantages of Using Transit Gateway in Amazon AWS

  • Per region VPN Tunnels: Instead of building tunnels every time we a new VPC is created . It allows to simply attach the VPC to the transit gateway within the region which will already have a VPN established. Once attached, it will simply be a matter of adding routing propagations to establish the connectivity of the VPC with VPN.

  • Attach to Transit Gateway once rather than peer to multiple VPCs: Every time a new VPC is created , It often times required to peer that VPC with other accounts and shared environments. With the Transit gateway, you can simply attach the VPC to the transit gateway and associate that attachment with the right routing domain and allow routes to propagate which will give that new VPC access to multiple VPCs and vice-versa. 

The known limitation of the AWS Transit gateway is the fact that it does not support the cross region support for which Inter-region Vpc peering is required. Though is in the future pipeline and rather the correct direction to be implemented once the AWS releases this functionality in the near future.

The best practice to deploy the AWS transit gateway is by using the Infrastructure as code practices tool like Terraform. Code i will share in the future post so stay tuned and subscribe to our blog.

Thursday, May 30, 2019

Progressing on the broken scp command

If you are in middle of the scp command of a large file and upload breaks in between due to network issue you can continue on the copy from where it broke using the rsync command as follows

 rsync -P -e ssh  :

Wednesday, March 20, 2019

3 Creating pods running containers minikube

[Solved] Error restarting cluster: wait: waiting for k8s-app=kube-proxy: timed out waiting for the condition

Error restarting cluster: wait: waiting for k8s-app=kube-proxy: timed out waiting for the condition

This occured during the minikube installation. To resolve this issue just delete the installation and start again that should resolve the issue
 ./minikube delete  
 ./minikube start
That should resolve this Error

[Solved] Unable to start VM: create: precreate: exec: "docker": executable file not found in $PATH

Unable to start VM: create: precreate: exec: "docker": executable file not found in $PATH

Occured during the minikube installation

Docker was not installed on the vm so installed the docker using the get.docker.com script as
 curl -fsSL https://get.docker.com/ | sh  
This should automatically detect the operating system and install the docker on your system.

[Solved] Unable to start VM: create: precreate: VBoxManage not found. Make sure VirtualBox is installed and VBoxManage is in the path

Unable to start VM: create: precreate: VBoxManage not found. Make sure VirtualBox is installed and VBoxManage is in the path

Following Error during the minikube installation on the virtualbox VM

Minikube and Vagrant vm dont work good simultaneously as its like running type2 virtualization over another type2 virtualization.
However it makes sense to run minikube on linux and if you running windows machine and want linux machine than you want to use virtualbox.

The solution is to disable the vm-driver of minikube to none as follows
 ./minikube config set vm-driver none  

That should solve your  problem

2 Minikube Installation

1 About Minikube and features

growpart fails to extend disk volume ( attempt to resize /dev/xvda failed. sfdisk output below )


attempt to resize /dev/xvda failed. sfdisk output below:
| Disk /dev/xvda: 104433 cylinders, 255 heads, 63 sectors/track
| Old situation:
| Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
|    Device Boot Start     End   #cyls    #blocks   Id  System
| /dev/xvda1   *      1   78324   78324  629137530   83  Linux
| /dev/xvda2          0       -       0          0    0  Empty
| /dev/xvda3          0       -       0          0    0  Empty
| /dev/xvda4          0       -       0          0    0  Empty
| New situation:
| Units = sectors of 512 bytes, counting from 0
|    Device Boot    Start       End   #sectors  Id  System
| /dev/xvda1   *     16065 1677716144 1677700080  83  Linux
| /dev/xvda2             0         -          0   0  Empty
| /dev/xvda3             0         -          0   0  Empty
| /dev/xvda4             0         -          0   0  Empty
| Successfully wrote the new partition table
| Re-reading the partition table ...
| BLKRRPART: Device or resource busy
| The command to re-read the partition table failed.
| Run partprobe(8), kpartx(8) or reboot your system now,
| before using mkfs
| If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
| to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
| (See fdisk(8).)
FAILED: failed to resize
***** WARNING: Resize failed, attempting to revert ******
Re-reading the partition table ...
BLKRRPART: Device or resource busy
The command to re-read the partition table failed.
Run partprobe(8), kpartx(8) or reboot your system now,
before using mkfs
***** Appears to have gone OK ****


# growpart /dev/xvda 1

If you are wondering you doing something wrong so there is absolutely nothing wrong with above command.

As you see there was no issue in creation of the new partition table it was successful. However I suspected that before completing it started to reread again due to which the disk was not increasing. I did tried multiple solutions and got some results for sfdisk however in my case the growpart was latest one only still the issue was coming.

At this point you will need to restart the server to fix this issue. If its the production server than you might have to take appropriate approvals as there is no other way after you restart the server the partition size should have increased at that time itself.

1. Rundeck Installation on Centos7.5

[Solved] invalid principal in policy

Problem:- I created a S3 policy same as the other policy which was above and when i saved the s3 policy it gave me the Invalid principal in policy and wont allow me to save the policy.

Cause:- I have given the wrong name of the arn due to which this issue was occurring, logically everything was correct. I believe AWS checked in backend that there was no such arn due to which it didn't allowed me to save the arn in first place.

Wrong ARN in my case:-
"AWS": "arn:aws:iam::446685876341:role/something-something-test-role"

Right ARN in my case:-
"AWS": "arn:aws:iam::446685876341:role/service-role/something-something-test-role"

Resolution:- Once i have resolved the above arn correctly so the error was resolved.

Monday, March 4, 2019

[Solved] url_helper.py[WARNING]: Calling '' failed [75/120s]: unexpected error ['NoneType' object has no attribute 'status_code']

Issue:- I was enabling the ENA support for the centos7.1 on the ec2 instance when i received following error
url_helper.py[WARNING]: Calling '' failed [75/120s]: unexpected error ['NoneType' object has no attribute 'status_code']

Due to which mynetwork card was not coming up for the instance and it was further resulting the instance-id failure due to which url_helper.py script of the AWS was failing to get the ip address. So when finally instance was booted as no ip was assigned to it the ssh checks known as instance checks were failing on the instance.

I was getting following logs which confirmed it

Cloud-init v. 0.7.5 running 'init' at Mon, 04 Mar 2018 06:33:38 +0000. Up 5.17 seconds.
cis-info: +++++++++++++++++++++++Net device info++++++++++++++++++++++++
cis-info: +--------+-------+-----------+-----------+-------------------+
cis-info: | Device |   Up  |  Address  |    Mask   |     Hw-Address    |
cis-info: +--------+-------+-----------+-----------+-------------------+
cis-info: | ens5:  | False |     .     |     .     | 06:f7:b8:fc:f1:20 |
cis-info: |  lo:   |  True | | |         .         |
cis-info: +--------+-------+-----------+-----------+-------------------+
cis-info: ++++++++++++++++++++++++++Route info+++++++++++++++++++++++++++
cis-info: +-------+-------------+---------+---------+-----------+-------+
cis-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
cis-info: +-------+-------------+---------+---------+-----------+-------+
cis-info: +-------+-------------+---------+---------+-----------+-------+
2018-03-03 22:33:38,836 - url_helper.py[WARNING]: Calling '' failed [0/120s]: unexpected error ['NoneType' object has no attribute 'status_code']

In the AWS Documentation it is mentioned to add GRUB_CMDLINE_LINUX=”net.ifnames=0” in the /boot/grub2/grub.cfg but for me.

I changed it and updated in the /etc/default/grub and recreated the grub.

After which the problem was resolved and I was successfully able to upgrade the instance to 5th generation support.

After the change i got the following output in the logs

Cloud-init v. 0.7.5 running 'init' at Mon, 04 Mar 2018 07:43:28 +0000. Up 8.73 seconds.
cis-info: ++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++
cis-info: +--------+------+-------------+---------------+-------------------+
cis-info: | Device |  Up  |   Address   |      Mask     |     Hw-Address    |
cis-info: +--------+------+-------------+---------------+-------------------+
cis-info: | ens5:  | True | | | 06:f7:b8:fc:f1:20 |
cis-info: |  lo:   | True |  |   |         .         |
cis-info: +--------+------+-------------+---------------+-------------------+
cis-info: +++++++++++++++++++++++++++++++Route info+++++++++++++++++++++++++++++++
cis-info: +-------+-------------+------------+---------------+-----------+-------+
cis-info: | Route | Destination |  Gateway   |    Genmask    | Interface | Flags |
cis-info: +-------+-------------+------------+---------------+-----------+-------+
cis-info: |   0   |   | |    |    ens5   |   UG  |
cis-info: |   1   | |   | |    ens5   |   U   |
cis-info: +-------+-------------+------------+---------------+-----------+-------+
Cloud-init v. 0.7.5 running 'modules:config' at Mon, 04 Mar 2018 07:43:30 +0000. Up 10.16 seconds.

[Solved] /etc/default/grub: line 60: serial: command not found

Issue:- When i tried running the below command it resulted in the error
$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
/etc/default/grub: line 60: serial: command not found

Cause:- You at some point made some mistake and run grub2-mkconfig -o /etc/default/grub which has overwritten your default grub file and when you are trying to create a grub file as mentioned above its erroring out in your old grub file

Resolution:- Manually edit and copy the following content in the grub file
vi /etc/default/grub
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet"

Friday, March 1, 2019

[Solved] Rate Limiting Errors in the Awscli

Error:- An error occurred (Throttling) when calling the DescribeLoadBalancers operation (reached max retries: 2): Rate exceeded
Error:- An error occurred (Throttling) when calling the GenerateCredentialReport operation (reached max retries: 4): Rate exceeded

Cause:- These types of Error occur when the rate limiting imposed by the AWS on its services crosses the threshold set by the AWS on its services. This can cause drop in your request due to which the automation scripts might not function or some of the request if run in batch is not completed which can further result in other issues.

1. Create models folder in your awscli path i.e. ~/.aws/models

mkdir ~/.aws/models

2. Create a retry with the following content inside the retry json file "~/.aws/models/_retry.json"

[Solved] Error: Driver 'pcspkr' is already registered, aborting

pcspkr is related to the pc speaker, so its safe to disable it, you can do it as follows

echo "blacklist pcspkr" > /etc/modprobe.d/blacklist-pcspkr.conf

Tuesday, February 19, 2019

Creating a your own hosted registry for the docker

1. Download the docker repository
wget https://download.docker.com/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker.repo

2. Install the docker-ce on the system as
yum install docker-ce -y

3. Create a directory as
mkdir /root/certs

4. Go to the website
sslforfree.com and generate the keys for your domain by manually verifying your domain and copy in the /root/certs directory

5. unzip the certs downloaded from sslforfree.zip
unzip sslforfree.zip
ls -ltr

-rw-r--r--. 1 centos centos 5599 Feb 19 11:11 sslforfree.zip
-rw-r--r--. 1 root   root   1703 Feb 19  2019 private.key
-rw-r--r--. 1 root   root   1922 Feb 19  2019 certificate.crt
-rw-r--r--. 1 root   root   1646 Feb 19  2019 ca_bundle.crt

6. Create the 2 directories as
[root@ip-10-240-43-119 certs]# mkdir -p /opt/registry/data
[root@ip-10-240-43-119 certs]# mkdir -p /var/lib/registry

7. Start and enable the docker service as
[root@ip-10-240-43-119 certs]# systemctl start docker
[root@ip-10-240-43-119 certs]# systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@ip-10-240-43-119 certs]#

6. Run your private repsository as
docker run -d -p 443:443 -v /root/certs:/certs -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/certificate.crt -e REGISTRY_HTTP_TLS_KEY=/certs/private.key -e REGISTRY_HTTP_ADDR= -v /opt/registry/data:/var/lib/registry --name registry registry:2

[root@ip-10-240-43-119 certs]# docker run -d -p 443:443 -v /root/certs:/certs -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/certificate.crt -e REGISTRY_HTTP_TLS_KEY=/certs/private.key -e REGISTRY_HTTP_ADDR= -v /opt/registry/data:/var/lib/registry --name registry registry:2
Unable to find image 'registry:2' locally
2: Pulling from library/registry
169185f82c45: Pull complete
046e2d030894: Pull complete
188836fddeeb: Pull complete
832744537747: Pull complete
7ceea07e80be: Pull complete
Digest: sha256:870474507964d8e7d8c3b53bcfa738e3356d2747a42adad26d0d81ef4479eb1b
Status: Downloaded newer image for registry:2

7. Add the DNS record pointing to your server as
registry.test.unixcloudfusion.in IN A

8. We can test access to the registry using curl. The response should provide headers, for example Docker-Distribution-API-Version, indicating the request was processed by the Registry server.

[root@ip-10-240-43-119 certs]# curl -iv https://registry.unixcloudfusion.in/v2/
* About to connect() to registry.unixcloudfusion.in port 443 (#0)
*   Trying
* Connected to registry.unixcloudfusion.in ( port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
*   CAfile: ca_bundle.crt
  CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
* subject: CN=*.unixcloudfusion.in
* start date: Feb 19 09:18:56 2019 GMT
* expire date: May 20 09:18:56 2019 GMT
* common name: *.unixcloudfusion.in
* issuer: CN=Let's Encrypt Authority X3,O=Let's Encrypt,C=US
> GET /v2/ HTTP/1.1
> User-Agent: curl/7.29.0
> Host: registry.unixcloudfusion.in
> Accept: */*
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Content-Length: 2
Content-Length: 2
< Content-Type: application/json; charset=utf-8
Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
X-Content-Type-Options: nosniff
< Date: Tue, 19 Feb 2019 16:31:33 GMT
Date: Tue, 19 Feb 2019 16:31:33 GMT

9. Download the image from the dockerhub, add the tags to identify it belongs to your repository
[root@ip-10-240-43-119 certs]# docker pull alpine:latest;docker tag alpine:latest registry.unixcloudfusion.in/alpine:alpinelocalv1
latest: Pulling from library/alpine
6c40cc604d8e: Pull complete
Digest: sha256:b3dbf31b77fd99d9c08f780ce6f5282aba076d70a513a8be859d8d3a4d0c92b8
Status: Downloaded newer image for alpine:latest

10. Verify the docker image as
[root@ip-10-240-43-119 certs]# docker images
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
registry                             2                   d0eed8dad114        2 weeks ago         25.8MB
alpine                               latest              caf27325b298        2 weeks ago         5.53MB
registry.unixcloudfusion.in/alpine   alpinelocalv1       caf27325b298        2 weeks ago         5.53MB

11. Push the image to your own repository
[root@ip-10-240-43-119 certs]# docker push registry.unixcloudfusion.in/alpine:alpinelocalv1
The push refers to repository [registry.unixcloudfusion.in/alpine]
503e53e365f3: Pushed
alpinelocalv1: digest: sha256:25b4d910f4b76a63a3b45d0f69a57c34157500faf6087236581eca221c62d214 size: 528

[Solved] x509: certificate signed by unknown authority

This error can occur if docker is not able to verify your certificate provider which might be due to the issue of bundle certificates used to verify the Certificate authority in absence of which you might be getting this error.

There is a workaround for this in which case it will ignore the certificate validation.

Create a file as /etc/docker/daemon.json
touch /etc/docker/daemon.json

Enter the following content in the daemon.json file replacing the endpoint for your repository as
[root@ip-10-240-43-119 certs]# cat /etc/docker/daemon.json
    "insecure-registries" : [ "registry.unixcloudfusion.in" ]

Go ahead and restart your docker service as
systemctl restart docker

Than try to push again to the repository this time you shouldn't get an error message.

Friday, February 15, 2019

[Solved] error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)

I got this error while running

kubectl exec busybox-744d79879-q4bvl -- /bin/sh

which resulted in

error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)

Your kubernetes apiserver is using a client certificate with CN=kubernetes to connect to the kubelets and that user is not currently authorized to use the kubelet API.

By default system:kubelet-api-admin cluster role defines the permissions required to access that API. You can grant that permission to your apiserver kubelet client user with

kubectl create clusterrolebinding apiserver-kubelet-api-admin --clusterrole system:kubelet-api-admin --user kubernetes

Prometheus Monitoring for Microservices

1. Coming to the age of the microservices the older monitoring systems are not much dependable especially when you have a dynamic environment where containers keep coming up and down.

2. Prometheus is an open-source monitoring and alerting system built at soundcloud in 2012 and now managed by Cloud native computing foundation in 2016 as the second hosted project after Kubernetes.

3. Prometheus main featues include a multi-dimensional data model with time series data identified by metric name and key/value pairs which helps in understand overall performance of the sytem graphically.

4. Prometheus support PromoQL, a flexible query language to leverage this dimensionality.

5. It's not reliant on distributed storage like zookeeper rather single server nodes are autonomous.

6. Time series collection happens via pull model over http and pushing is supported via an intermediary gateway.

7. Targets for the monitoring are discovered via service discovery or static configuration which allows you to dynamically configure monitoring in a dynamic environment.

8. The main components of the prometheus is prometheus server which scrapes and stores time series data, client libraries for instrumenting application code, push gateway for supporting short-lived jobs, exporters like HAProxy, StatsD, Graphite etc, an alertmanager to handle alerts and various support tools.

9. Most of the prometheus components are written in Go programming language, making them easy to build and deploy as static libraries.

10. Prometheus works well with the purely numberic timer series metric. It fits both the machine centric monitoring as well as monitoring of highly dynamic service-oriented architectures. From microservices point of view it supports multi-dimensional data collection and querying is a particular strength.

In our future posts we are going to compare the prometheus with other monitoring tools.

Creating Docker Private Registry from scratch nonproduction only

Consider the following diagram to understand how the container calls the images in the dockerhub initially and how we can replace the dockerhub with our own local registry to store our docker images which will only be available in our own network , thus making it more secure

For a detailed walkthrough on how you can create your own private docker registry, go through the following video in which we have demonstrated how you create your own private docker registry in the nonproduction environment.

Wednesday, February 13, 2019

What is Service Mesh ?

As the introduction of the distributed microservices architecture for creating web/mobile based applications has increased and the orchestration tools such as kubernetes, public clouds has increased and made it more convenient to facilitate these microservice based architecture so the next demand is towards the deployment of the service mesh.

The term service mesh is used to describe the network of microservices that make up the applications running in an environment and how they are interacting amongst themselves. As the environment grows so the is the size of the services and there complexity to communicate both synchronously and asynchronously due to which it becomes harder and challenging to understand and manage such environments.

Than the requirements such as service discovery, load balancing, failure recovery, metrices and continuous monitoring often combines the requirement for more complex operational requirements like A/B testing, canary releases, rate limiting, access control and end-to-end authentication for the various api's and services.

The service mesh provides behavioural insights and operational control over the service mesh as a whole by offering a complete solution to satisfy the diverse requirements for managing the microservice applications.

Some of the leading service mesh provider include Istio developed in collaboration between Lyft, IBM , Google, Vmware and RedHat. Alternatives to Istion include Linkerd, the first service mesh to be ever developed created by Bouyant which open source service mesh written in scale and can be deployed on multiple types of clusters. Than there is Consul developed by Hashicorp which runs on agent-based model i.e. Consul client and finally than there is AWS App Mesh which is specifically developed for the AWS Public cloud.

We will be covering them in more detail in our future posts.

Tuesday, February 12, 2019

[Solved] S3 Bucket action doesn't apply to any resources

This error occurred when i tried implementing the s3 bucket policy.

this is due to the following policy which i was implementing

            "Action": [
            "Resource": [

The issue here is , I was trying to implement it on the bucket only when the action has to applied in the form of regex to all the objects under the bucket so i replaced it with

            "Action": [
            "Resource": [

That resolved my issue.