Posts

Showing posts from January, 2018

Scalable ELK Architecture

Image
Requirements Nginx logs Application logs System logs ( To be decided )   This setup will unify the application logging and nginx logs. Events Producers These are our standard instances which will produce logs. Elastic Beats will be used as specialized shippers here and it has the capacity to send logs directly to Kafka cluster. https://www.elastic.co/guide/en/beats/filebeat/current/kafka-output.html Kafka Cluster Like many other message brokers, it deals with publisher-consumer and queue semantics by grouping data into topics. As an application, you write to a topic and consume from a topic. An important distinction, or a shift in design with Kafka is that the complexity moves from producer to consumers, and it heavily uses the file system cache. Kafka has a dependency on Apache ZooKeeper , so will need access to a ZooKeeper cluster. Logstash Indexers This cluster will consume data, at its own throttled speed, while performing exp...

Downscaling the ES cluster

Downscaling the DataNodes in Elastic Search Cluster We will use the following steps to remove the datanodes: 1. Remove elasticsearch node from the cluster just run the following command curl -XPUT P.P.P.P:9200/_cluster/settings -d '{ "transient" :{ "cluster.routing.allocation.exclude._ip" : "X.X.X.X" } }';echo Here P.P.P.P is the private IP of the master node.This command will give acknowledgement true if the node is accepted to be removed and the data relocation will start. 2. Check if the data relocation is over and the node doesn't have any shards left on it using the following command: curl -XGET 'localhost:9200/_cat/allocation?v&pretty'  When the shards and disk.indices on the node gets 0 that means all data is being transferred to other nodes 3. Stop the elastic search service on the data node and terminate the instances. Downscaling the master nodes 1. Identify the actual master of the cluster b...

Limiting Excessive Resource utilization by AWS S3 sync

If you want to take backup using the aws s3 sync on a large number of files than it can result in the excessive cpu and network utilization. You can limit the same using the s3api as follows $ aws configure set default.s3.max_concurrent_requests 5 $ aws configure set default.s3.multipart_threshold 6MB $ aws configure set default.s3.multipart_chunksize 6MB If you want to limit the network than you can use the Trickle in linux to limit the network upload and download speed trickle -u 10 -d 20 aws s3 sync source destination This limits the upload speed to 10kbps and download  to 20kbps