-->

Thursday, February 25, 2021

[Solved] Restructuring the CDN Logs

 Problem:- Cloudfront logs are stored in the following format

 distributionid-year-month-date-hour.gz  

So if you are looking to analyse these logs you need something similar to the Athena which can directly run your queries over the s3 bucket which is storing these logs.

But Athena requires partition data which simply means storing data in a format of (e.g. a folder structure). This allows you to restrict the athena to the limited data which you want to analyze other by default it will take the entire data and cost you more while reading GBs of data which you dont want.

By default Athena tries to "read all" the data. But if you have partitioned it like year/month/day than you can register it like

  year=2021/month=02/day=25 -- s3://logs/2021/02/25  

This allows your to simply use the where clause and with partition indices to restrict the athena to read the data you are interested in

  SELECT uri, count(1)   
   FROM cloudfront_logs  
   WHERE status = 404   
    AND (year || month || day || hour) > ‘20200225’  


Solution:-



For setting this up you need to first create the Cloudfront distribution which delivers the logs to the S3 bucket by default.

Nex create a lambda called cdn-log-restructured in the python2.7 with Handler as lambda_function.lambda_handler and use the below code for the lambda

 import boto3  
 def lambda_handler(event, context):  
   s3 = boto3.client('s3')  
   print("Restructuring s3 path for cloudfront logs")  
   # Iterate over all records in the list provided  
   for record in event['Records']:  
     # Get the S3 bucket  
     bucket = record['s3']['bucket']['name']  
     # Get the source S3 object key  
     key = record['s3']['object']['key']  
     # Get just the filename of the source S3 object, increase to 2 if use distro  
     filename = key.split('/')[1]  
     #print("f: %s" % filename)  
     # Get the yyyy-mm-dd-hh from the source S3 object  
     dateAndHour = filename.split('.')[1].split('/')[0]  
     #print(dateAndHour)  
     year, month, day, hour = dateAndHour.split('-')  
     # Create destination path  
     dest = 'test/{}/{}/{}/{}'.format(  
       year, month, day, filename  
     )  
     # Display source/destination in Lambda output log  
     print("- src: s3://%s/%s" % (bucket, key))  
     print("- dst: s3://%s/%s" % (bucket, dest))  
     # Perform copy of the S3 object  
     s3.copy_object(Bucket=bucket, Key=dest, CopySource=bucket + '/' + key)  
     # Delete the source S3 object  
     # Disable this line if a copy is sufficient  
     s3.delete_object(Bucket=bucket, Key=key)  

Also you would need to create a lambda role with the IAM policy as below

 {  
   "Version": "2012-10-17",  
   "Statement": [  
     {  
       "Sid": "VisualEditor0",  
       "Effect": "Allow",  
       "Action": [  
         "s3:Get*",  
         "s3:List*",  
         "s3:PutObject",  
         "s3:PutObjectTagging",  
         "s3:DeleteObject"  
       ],  
       "Resource": [  
         "arn:aws:s3:::cloufront-logs",  
         "arn:aws:s3:::cloudfront-logs/test/*"  
       ]  
     }  
   ]  
 }  

Go to the s3 bucket(Cloudfront-logs)---> Create a new event notification---> set prefix(test)---> in destination select (Lambda Function) ----> Enter lambda  function ARN (arn:aws:lambda:us-west-2:XXXXXXXXXXX:function:cdn-log-restructured) 

This will create a Trigger on the s3 bucket so that whenever a new log file is delivered to the s3 bucket by cloudfront it automatically triggers a lambda and creates a new folder structure as

 %3CmxGraphModel%3E%3Croot%3E%3CmxCell%20id%3D%220%22%2F%3E%3CmxCell%20id%3D%221%22%20parent%3D%220%22%2F%3E%3CmxCell%20id%3D%222%22%20style%3D%22edgeStyle%3DorthogonalEdgeStyle%3Brounded%3D0%3BorthogonalLoop%3D1%3BjettySize%3Dauto%3Bhtml%3D1%3B%22%20edge%3D%221%22%20source%3D%223%22%20parent%3D%221%22%3E%3CmxGeometry%20relative%3D%221%22%20as%3D%22geometry%22%3E%3CmxPoint%20x%3D%22360%22%20y%3D%22150%22%20as%3D%22targetPoint%22%2F%3E%3C%2FmxGeometry%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%223%22%20value%3D%22%22%20style%3D%22rounded%3D0%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BstrokeWidth%3D4%3BfillColor%3D%23dae8fc%3BstrokeColor%3D%236c8ebf%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22200%22%20y%3D%2240%22%20width%3D%22320%22%20height%3D%2260%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%224%22%20value%3D%22%26lt%3Bfont%20style%3D%26quot%3Bfont-size%3A%2020px%26quot%3B%26gt%3BCloudfront%26lt%3B%2Ffont%26gt%3B%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22330%22%20y%3D%2260%22%20width%3D%2240%22%20height%3D%2220%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%225%22%20style%3D%22edgeStyle%3DorthogonalEdgeStyle%3Brounded%3D0%3BorthogonalLoop%3D1%3BjettySize%3Dauto%3Bhtml%3D1%3BentryX%3D0.5%3BentryY%3D0%3BentryDx%3D0%3BentryDy%3D0%3B%22%20edge%3D%221%22%20source%3D%226%22%20target%3D%228%22%20parent%3D%221%22%3E%3CmxGeometry%20relative%3D%221%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%226%22%20value%3D%22%22%20style%3D%22rounded%3D0%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BstrokeWidth%3D4%3BfillColor%3D%23dae8fc%3BstrokeColor%3D%236c8ebf%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22200%22%20y%3D%22151%22%20width%3D%22320%22%20height%3D%2260%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%227%22%20value%3D%22%26lt%3Bspan%20style%3D%26quot%3Bfont-size%3A%2020px%26quot%3B%26gt%3B%26amp%3Bnbsp%3BS3%20Bucket%26lt%3Bbr%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bcolumn-Name%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bobject-link%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bname%20object%20latest%20object-name%26quot%3B%26gt%3BE2X8QTMLCYW88L.2021-02-24-05.67e5c8b7.gz%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3Bspan%20style%3D%26quot%3Bfont-size%3A%2020px%26quot%3B%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3B%2Fspan%26gt%3B%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22210%22%20y%3D%22171%22%20width%3D%22290%22%20height%3D%2220%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%228%22%20value%3D%22%22%20style%3D%22rounded%3D0%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BstrokeWidth%3D4%3BfillColor%3D%23dae8fc%3BstrokeColor%3D%236c8ebf%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22200%22%20y%3D%22260%22%20width%3D%22320%22%20height%3D%2260%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%229%22%20value%3D%22%26lt%3Bfont%20style%3D%26quot%3Bfont-size%3A%2012px%26quot%3B%26gt%3B%26amp%3Bnbsp%3B%26lt%3B%2Ffont%26gt%3B%26lt%3Bfont%26gt%3B%26lt%3Bfont%20style%3D%26quot%3Bfont-size%3A%2022px%26quot%3B%26gt%3BLAMBDA%26lt%3B%2Ffont%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bcolumn-Name%26quot%3B%20style%3D%26quot%3Bfont-size%3A%2012px%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bobject-link%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bname%20object%20latest%20object-name%26quot%3B%26gt%3B%26lt%3Bspan%26gt%3Bcdn-log-restructured%26lt%3B%2Fspan%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Ffont%26gt%3B%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22210%22%20y%3D%22280%22%20width%3D%22290%22%20height%3D%2220%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2210%22%20value%3D%22%22%20style%3D%22rounded%3D0%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BstrokeWidth%3D4%3BfillColor%3D%23dae8fc%3BstrokeColor%3D%236c8ebf%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%2290%22%20y%3D%22360%22%20width%3D%22530%22%20height%3D%2290%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2211%22%20value%3D%22%26lt%3Bspan%20style%3D%26quot%3Bfont-size%3A%2020px%26quot%3B%26gt%3B%26amp%3Bnbsp%3BS3%20Bucket%26lt%3Bbr%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bcolumn-Name%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bobject-link%26quot%3B%26gt%3B%26lt%3Bspan%20class%3D%26quot%3Bname%20object%20latest%20object-name%26quot%3B%26gt%3Bstructured%2FE2X8QTMLCYW88L%2F2021%2F02%2F24%2F05%2F%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3B%26lt%3B%2Fspan%26gt%3BE2X8QTMLCYW88L.2021-02-24-05.67e5c8b7.gz%26lt%3Bspan%20style%3D%26quot%3Bfont-size%3A%2020px%26quot%3B%26gt%3B%26lt%3Bbr%26gt%3B%26lt%3B%2Fspan%26gt%3B%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%2260%22%20y%3D%22350%22%20width%3D%22590%22%20height%3D%2290%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2212%22%20style%3D%22edgeStyle%3DorthogonalEdgeStyle%3Brounded%3D0%3BorthogonalLoop%3D1%3BjettySize%3Dauto%3Bhtml%3D1%3B%22%20edge%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20relative%3D%221%22%20as%3D%22geometry%22%3E%3CmxPoint%20x%3D%22360%22%20y%3D%22321%22%20as%3D%22sourcePoint%22%2F%3E%3CmxPoint%20x%3D%22360%22%20y%3D%22360%22%20as%3D%22targetPoint%22%2F%3E%3C%2FmxGeometry%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2213%22%20value%3D%22logs%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22360%22%20y%3D%22110%22%20width%3D%2240%22%20height%3D%2220%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3CmxCell%20id%3D%2214%22%20value%3D%22S3%20lambda%20trigger%22%20style%3D%22text%3Bhtml%3D1%3BstrokeColor%3Dnone%3BfillColor%3Dnone%3Balign%3Dcenter%3BverticalAlign%3Dmiddle%3BwhiteSpace%3Dwrap%3Brounded%3D0%3B%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20x%3D%22360%22%20y%3D%22220%22%20width%3D%22100%22%20height%3D%2220%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3C%2Froot%3E%3C%2FmxGraphModel%3E

0 comments:

Post a Comment