Thursday, March 9, 2017

Custom Cloudwatch Alarm Configuration Part-8

As discussed in the previous post regarding the alarm plugins those plugins are used to push the metrics data to the cloudwatch using the cron running every minute or 5minutes depending upon your requirements.

Next we have to create the alarms in the cloudwatch on the above metrics which works on the logic that if the metrics crosses the threshold value than an event is triggered which could be like send a mail through sns alerting that the value has crossed the threshold and if it agains comes below threshold than it state is changed from alarm to ok which is more like a recovery.

But unlike from the console we are going to trigger this programmatically using the AWS CLI provided by the AWS. The script works sequentially and uses the array which runs in a loop and all the relevant alarms are created.

The most important thing to be considered here is the name of the alarm which is to be created in the cloudwatch. Now you can put any name but the name based on programmatic assumptions following a meaningful pattern should be used so that you are able to easily identify the environment, application, alarm type, service affected is easily convened. And the team receiving can immediately work towards its resolution.

For this to work we will create a conf directory where all the configuration will be stored like in the previous post we used the bin directory where all the executables were stored. So under the conf directory we will keep a alarm.conf file which will hold all the information about the alarm configuration of the cloudwatch.

Below is the code for creating the different alarms in the cloudwatch. The important point is the array should be sequential array as it runs in a loop.

We are going to keep the alarm name in the following pattern

Alarm Name:- Monitoring[Mon]_Production[pd]_Appshortname[app]_Appfullname[appname]_AWSService[EC2]_SystemService[cpu,memory,process]_ServerIP[Hostipaddress]

The above alarm name would provide us the complete information about the alarm and service which is getting affected in which environment and the IP details for the easy identification so that the relevant support team can immediately start with the resolution.

Another thing is the MetricName would be based on the Service which affected like CPUUtilization , Namespace which is the broader category APP_EC2.


declare -a ALARM_ARRAY

ALARM_ARRAY[0]='--alarm-name MON_PD_APP_APPNAME_EC2_CPUUtilization_'$HOST_ID' --metric-name CPUUtilization --namespace AWS/EC2 --statistic Maximum --period 60 --threshold 80 --comparison-operator GreaterThanThreshold  --dimensions Name=InstanceId,Value='$INSTANCE_ID' --evaluation-periods 2 --alarm-actions arn:aws:sns:ap-south-1:86735258632263:SNS_TOPIC_NAME --unit Percent';

ALARM_ARRAY[1]='--alarm-name MON_PD_APP_APPNAME_EC2_Memory-Utilization_'$HOST_ID' --metric-name Memory-Utilization --namespace APP_EC2 --statistic Maximum --period 60 --threshold 80 --comparison-operator GreaterThanThreshold  --dimensions Name=InstanceId,Value='$INSTANCE_ID' --evaluation-periods 2 --alarm-actions arn:aws:sns:ap-south-1:86735258632263:SNS_TOPIC_NAME --unit Percent';

ALARM_ARRAY[2]='--alarm-name MON_PD_APP_APPNAME_EC2_Process-ProcessName_'$HOST_ID' --metric-name Process-TejServer --namespace APP_EC2 --statistic Maximum --period 60 --threshold 1 --comparison-operator LessThanThreshold  --dimensions Name=InstanceId,Value='$INSTANCE_ID' --evaluation-periods 2 --alarm-actions arn:aws:sns:ap-south-1:86735258632263:SNS_TOPIC_NAME --unit Count'; 


Post a Comment