[Solved] Restructuring the CDN Logs
Problem:- Cloudfront logs are stored in the following format distributionid-year-month-date-hour.gz So if you are looking to analyse these logs you need something similar to the Athena which can directly run your queries over the s3 bucket which is storing these logs. But Athena requires partition data which simply means storing data in a format of (e.g. a folder structure). This allows you to restrict the athena to the limited data which you want to analyze other by default it will take the entire data and cost you more while reading GBs of data which you dont want. By default Athena tries to "read all" the data. But if you have partitioned it like year/month/day than you can register it like year=2021/month=02/day=25 -- s3://logs/2021/02/25 This allows your to simply use the where clause and with partition indices to restrict the athena to read the data you are interested in SELECT uri, count(1) FROM cloudfront_logs WHERE status = 404 AND (year || ...