Cloud Devops Automation

Posts

Showing posts from 2019

Kubernetes Important Commands And Concepts

October 16, 2019

1. Listing all the resource s in the kubernetes you can use kubectl api-resources -o name (which lists the resources according to the name only) 2. Spec Every object in kubernetes has a specification provided by the user which defines the state for that object to be in. 3. Status Status represents the current actual state of the object. Kubernetes matches the spec to achieve the desired state specified in the spec 4. kubectl get:- to get the list of objects in kubernetes. For e.g kubectl get pods -n kube-system , kubectl get nodes you can get more detailed information about a object like kubectl get nodes kube-node1 -o yaml (yaml representation of this object) 5. kubectl describe kube-node1 (Readable overview about an object but not the yaml format) 6. Pods can contain one or more containers and a set of resources shared by those containers. All containers in kubernetes are part of a pod. Example of pod Yaml https://github.com/ankit630/IAC/blob/master/kubernetes/...

Part 2 Using Athena to query S3 buckets

October 16, 2019

This is in continuation to my previous post on how can use the Athena to query the s3 buckets storing the cloudtrail logs in order to better organize your security and compliance which is hard thing to achieve in a legacy/large accounts with number of users. Question:- Identifying the last 100 most used IAM Keys . Usually IAM roles is better approach to be used than using the IAM keys for the authentication as IAM roles can rotate the keys after every 15minutes thus making hard to intercept the keys and increasing the security of the Account. Answer SELECT useridentity.accesskeyid, useridentity.arn, eventname, COUNT(eventname) as frequency FROM account_cloudtrail WHERE sourceipaddress NOT LIKE '%.com' AND year = '2019' AND month = '01' AND day = '01' AND useridentity.accesskeyid LIKE 'AKIA%' GROUP BY useridentity.accesskeyid, useridentity.arn, eventname ...

Part 1 Using Athena to query S3 buckets

October 11, 2019

While its great to push all the logs data gathered from various sources like your load balancers, cloudtrail, application logs etc to the S3 buckets. But as your infrastructure grows in size it becomes difficult to analyze such huge amount of data of months or year. You can use the Athena service of the Amazon AWS to query the S3 service data without the need of downloading and processing it manually. This saves the requirement of extra processing, space requirement etc. We are going to cover the query details of most of the effective queries which can help you analyze and meaningful information from your s3 logs data. Question:- Identifying all the users,events,accounts accessing a particular s3 bucket Answer:- SELECT DISTINCT account, eventname, useridentity.arn, useragent, vpcendpointid, json_extract_scalar(requestparameters, '$.bucketName') AS bucketName, ...

Command Logging & Kibana Plotting

October 10, 2019

Problem Statement : Monitor & track all the activities/commands used by user on system Minimum Requirement(s): 1) Required separate ELK cluster for command logging 2) Required Snoopy Logger Agent on all client machines. 3) Required File beat agent. Context : In order to track what all commands are being fired by users , we''ll be needing bash_history of that specific user it becomes tedious task when we have to track specific user (or multiple user)in different machine...

[Solved] CannotPullContainerError: no space left on device

October 09, 2019

By default ECS service of AWS doesn't take care of free disk space on ECS instances while putting new tasks. It uses only CPU and Memory resources for a task placement. In case of disk overfilling, ECS is trying to start new task anyways, but it fails because of error “CannotPullContainerError: no space left on device”. Overfilled instances stay active in cluster until regular cluster roll replaces all instances. Solution:- The correct way for handling task placement is by letting ECS know about free disk space (set custom attribute) and set placement constrant for a task definition. ( https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-placement-constraints.html ). Once we have custom attribute that indicates disk usage, we can configure task definition to not place task if used disk space greater than configured threshold. This can be achieved by included the shell script for monitoring free space and deregistering an instance. The script needs to be run throu...

Python3 installation on Centos7

August 20, 2019

If you are installing the python3 on centos7 than it comes by default shipped with python 2.7.5 which is required by the core os of the centos. Now if you try to install through yum than you can't directly install the python3 instead you would require to install the software collections first. Software collections is an community project which allows you to build, install and use the multiple version of software on same system without affecting default system packages. It will enable the multiple version of the programming languages which are not available in the core repositories. To install scl run the following command # yum install centos-release-scl To install the python3 run # yum install rh-python36 You will also need to install the 'Development tools' which are required for building python modules # yum groupinstall 'Development Tools' Python virtual environments allows you to install the python modules in isolated location for a specific pro...

Main Advantages of Using Transit Gateway in Amazon AWS

August 07, 2019

Per region VPN Tunnels : Instead of building tunnels every time we a new VPC is created . It allows to simply attach the VPC to the transit gateway within the region which will already have a VPN established. Once attached, it will simply be a matter of adding routing propagations to establish the connectivity of the VPC with VPN. Attach to Transit Gateway once rather than peer to multiple VPCs : Every time a new VPC is created , It often times required to peer that VPC with other accounts and shared environments. With the Transit gateway, you can simply attach the VPC to the transit gateway and associate that attachment with the right routing domain and allow routes to propagate which will give that new VPC access to multiple VPCs and vice-versa. Limitations:- The known limitation of the AWS Transit gateway is the fact that it does not support the cross region support for which Inter-region Vpc peering is required. Though is in the future pipeline and rather the c...

Progressing on the broken scp command

May 30, 2019

If you are in middle of the scp command of a large file and upload breaks in between due to network issue you can continue on the copy from where it broke using the rsync command as follows rsync -P -e ssh :

[Solved] Error restarting cluster: wait: waiting for k8s-app=kube-proxy: timed out waiting for the condition

March 20, 2019

Error:- Error restarting cluster: wait: waiting for k8s-app=kube-proxy: timed out waiting for the condition Solution:- This occured during the minikube installation. To resolve this issue just delete the installation and start again that should resolve the issue ./minikube delete ./minikube start That should resolve this Error

[Solved] Unable to start VM: create: precreate: exec: "docker": executable file not found in $PATH

March 20, 2019

Error:- Unable to start VM: create: precreate: exec: "docker": executable file not found in $PATH Occurence:- Occured during the minikube installation Resolution:- Docker was not installed on the vm so installed the docker using the get.docker.com script as curl -fsSL https://get.docker.com/ | sh This should automatically detect the operating system and install the docker on your system.

[Solved] Unable to start VM: create: precreate: VBoxManage not found. Make sure VirtualBox is installed and VBoxManage is in the path

March 20, 2019

Error:- Unable to start VM: create: precreate: VBoxManage not found. Make sure VirtualBox is installed and VBoxManage is in the path Occurence:- Following Error during the minikube installation on the virtualbox VM Cause/Resolution:- Minikube and Vagrant vm dont work good simultaneously as its like running type2 virtualization over another type2 virtualization. However it makes sense to run minikube on linux and if you running windows machine and want linux machine than you want to use virtualbox. The solution is to disable the vm-driver of minikube to none as follows ./minikube config set vm-driver none That should solve your problem

2 Minikube Installation

March 20, 2019

1 About Minikube and features

March 20, 2019

growpart fails to extend disk volume ( attempt to resize /dev/xvda failed. sfdisk output below )

March 20, 2019

1. Rundeck Installation on Centos7.5

March 20, 2019

[Solved] invalid principal in policy

March 20, 2019

Problem:- I created a S3 policy same as the other policy which was above and when i saved the s3 policy it gave me the Invalid principal in policy and wont allow me to save the policy. Cause:- I have given the wrong name of the arn due to which this issue was occurring, logically everything was correct. I believe AWS checked in backend that there was no such arn due to which it didn't allowed me to save the arn in first place. Wrong ARN in my case:- "AWS": "arn:aws:iam::446685876341:role/something-something-test-role" Right ARN in my case:- "AWS": "arn:aws:iam::446685876341:role/service-role/something-something-test-role" Resolution:- Once i have resolved the above arn correctly so the error was resolved.

[Solved] url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [75/120s]: unexpected error ['NoneType' object has no attribute 'status_code']

March 04, 2019

Issue:- I was enabling the ENA support for the centos7.1 on the ec2 instance when i received following error url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [75/120s]: unexpected error ['NoneType' object has no attribute 'status_code'] Due to which mynetwork card was not coming up for the instance and it was further resulting the instance-id failure due to which url_helper.py script of the AWS was failing to get the ip address. So when finally instance was booted as no ip was assigned to it the ssh checks known as instance checks were failing on the instance. I was getting following logs which confirmed it Cloud-init v. 0.7.5 running 'init' at Mon, 04 Mar 2018 06:33:38 +0000. Up 5.17 seconds. cis-info: +++++++++++++++++++++++Net device info++++++++++++++++++++++++ cis-info: +--------+-------+-----------+-----------+-------------------+ cis-info: | Device | Up | Address | Mask ...

[Solved] /etc/default/grub: line 60: serial: command not found

March 04, 2019

Issue:- When i tried running the below command it resulted in the error $ sudo grub2-mkconfig -o /boot/grub2/grub.cfg /etc/default/grub: line 60: serial: command not found Cause:- You at some point made some mistake and run grub2-mkconfig -o /etc/default/grub which has overwritten your default grub file and when you are trying to create a grub file as mentioned above its erroring out in your old grub file Resolution:- Manually edit and copy the following content in the grub file vi /etc/default/grub GRUB_TIMEOUT=5 GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet" GRUB_DISABLE_RECOVERY="true"

[Solved] Rate Limiting Errors in the Awscli

March 01, 2019

Error:- An error occurred (Throttling) when calling the DescribeLoadBalancers operation (reached max retries: 2): Rate exceeded Error:- An error occurred (Throttling) when calling the GenerateCredentialReport operation (reached max retries: 4): Rate exceeded Cause:- These types of Error occur when the rate limiting imposed by the AWS on its services crosses the threshold set by the AWS on its services. This can cause drop in your request due to which the automation scripts might not function or some of the request if run in batch is not completed which can further result in other issues. Solution:- 1. Create models folder in your awscli path i.e. ~/.aws/models mkdir ~/. aws / models 2. Create a retry with the following content inside the retry json file "~/.aws/models/_retry.json"

[Solved] Error: Driver 'pcspkr' is already registered, aborting

March 01, 2019

pcspkr is related to the pc speaker, so its safe to disable it, you can do it as follows Solution:- echo "blacklist pcspkr" > /etc/modprobe.d/blacklist-pcspkr.conf

Creating a your own hosted registry for the docker

February 19, 2019

1. Download the docker repository wget https://download.docker.com/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker.repo 2. Install the docker-ce on the system as yum install docker-ce -y 3. Create a directory as mkdir /root/certs 4. Go to the website sslforfree.com and generate the keys for your domain by manually verifying your domain and copy in the /root/certs directory 5. unzip the certs downloaded from sslforfree.zip unzip sslforfree.zip ls -ltr -rw-r--r--. 1 centos centos 5599 Feb 19 11:11 sslforfree.zip -rw-r--r--. 1 root root 1703 Feb 19 2019 private.key -rw-r--r--. 1 root root 1922 Feb 19 2019 certificate.crt -rw-r--r--. 1 root root 1646 Feb 19 2019 ca_bundle.crt 6. Create the 2 directories as [root@ip-10-240-43-119 certs]# mkdir -p /opt/registry/data [root@ip-10-240-43-119 certs]# mkdir -p /var/lib/registry 7. Start and enable the docker service as [root@...

[Solved] x509: certificate signed by unknown authority

February 19, 2019

This error can occur if docker is not able to verify your certificate provider which might be due to the issue of bundle certificates used to verify the Certificate authority in absence of which you might be getting this error. There is a workaround for this in which case it will ignore the certificate validation. Create a file as /etc/docker/daemon.json touch /etc/docker/daemon.json Enter the following content in the daemon.json file replacing the endpoint for your repository as [root@ip-10-240-43-119 certs]# cat /etc/docker/daemon.json { "insecure-registries" : [ "registry.unixcloudfusion.in" ] } Go ahead and restart your docker service as systemctl restart docker Than try to push again to the repository this time you shouldn't get an error message.

[Solved] error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy)

February 15, 2019

I got this error while running kubectl exec busybox-744d79879-q4bvl -- /bin/sh which resulted in error: unable to upgrade connection: Forbidden (user=kubernetes, verb=create, resource=nodes, subresource=proxy) Cause/Resolution:- Your kubernetes apiserver is using a client certificate with CN=kubernetes to connect to the kubelets and that user is not currently authorized to use the kubelet API. By default system:kubelet-api-admin cluster role defines the permissions required to access that API. You can grant that permission to your apiserver kubelet client user with kubectl create clusterrolebinding apiserver-kubelet-api-admin --clusterrole system:kubelet-api-admin --user kubernetes

Prometheus Monitoring for Microservices

February 15, 2019

1. Coming to the age of the microservices the older monitoring systems are not much dependable especially when you have a dynamic environment where containers keep coming up and down. 2. Prometheus is an open-source monitoring and alerting system built at soundcloud in 2012 and now managed by Cloud native computing foundation in 2016 as the second hosted project after Kubernetes. 3. Prometheus main featues include a multi-dimensional data model with time series data identified by metric name and key/value pairs which helps in understand overall performance of the sytem graphically. 4. Prometheus support PromoQL, a flexible query language to leverage this dimensionality. 5. It's not reliant on distributed storage like zookeeper rather single server nodes are autonomous. 6. Time series collection happens via pull model over http and pushing is supported via an intermediary gateway. 7. Targets for the monitoring are discovered via service discovery or static configura...

Creating Docker Private Registry from scratch nonproduction only

February 15, 2019

Consider the following diagram to understand how the container calls the images in the dockerhub initially and how we can replace the dockerhub with our own local registry to store our docker images which will only be available in our own network , thus making it more secure For a detailed walkthrough on how you can create your own private docker registry, go through the following video in which we have demonstrated how you create your own private docker registry in the nonproduction environment.

Understanding AWS S3 Objects Crossaccount Permissions Architecture

February 14, 2019

3. Understanding servicemesh event details

February 14, 2019

Servicemesh is a networking model ?

February 14, 2019

What is Service Mesh ?

February 13, 2019

As the introduction of the distributed microservices architecture for creating web/mobile based applications has increased and the orchestration tools such as kubernetes, public clouds has increased and made it more convenient to facilitate these microservice based architecture so the next demand is towards the deployment of the service mesh. The term service mesh is used to describe the network of microservices that make up the applications running in an environment and how they are interacting amongst themselves. As the environment grows so the is the size of the services and there complexity to communicate both synchronously and asynchronously due to which it becomes harder and challenging to understand and manage such environments. Than the requirements such as service discovery, load balancing, failure recovery, metrices and continuous monitoring often combines the requirement for more complex operational requirements like A/B testing, canary releases, rate limiting, access c...

[Solved] S3 Bucket action doesn't apply to any resources

February 12, 2019

This error occurred when i tried implementing the s3 bucket policy. this is due to the following policy which i was implementing "Action": [ "s3:GetBucketLocation", "s3:ListBucket", "s3:GetObject", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::bucketname" ] The issue here is , I was trying to implement it on the bucket only when the action has to applied in the form of regex to all the objects under the bucket so i replaced it with "Action": [ ...