Wednesday, October 3, 2018

Important points for Elasticsearch Optimizations

Points to be taken care before creating cluster:

  • Volume of data
  • Nodes and capacity planning.
  • Balancing, High availability, Shards Allocation.
  • Understanding the queries that clusters will serve.

Config walk-through: 

Represents the name of the cluster and it should be same across the nodes in the cluster.

Represent the name of the particular node in the cluster. It must be unique for every node and it is good to represent the hostname.

Location where the elasticsearch need to store the index data in disk. If you are planning to handle huge amount of data in the cluster, it is good to point to another EBS volume instead of root volume.

Location where the elasticsearch needs to store the server startup, indexing and other logs. It is also good to store at other than EBS volume.

This is an important config in ES config file. This needs to set as "TRUE".  This config locks the amount of heap memory that is configured in the JAVA_ARGS to elasticsearch. If it is not configured, the OS may swap out the data of ES into disk and in-turn garbage collections may take more than a minute instead of milliseconds. This directly affects the node status and chances are high that the nodes may come out of the cluster.

This config will set both network.bind.host and network.publish.host. Since we are trying to configuring the ES as cluster, bind and publish shouldn't be localhost or loopback address.

This config need to hold all the node resolvable host-name in the ES cluster.

Never .. Ever.. Enable multicast ping discovery. That will create unwanted ping checks for the node discovery across the infrastructure(say 5 nodes, that pings all the 100 servers in the infra. Its bad). Also it is deprecated in Elasticsearch 5.x

Number of master eligible nodes need to be live for deciding the leader. The quorum can be calculated by (N/2)+1. Where N is the count of master eligible nodes. For three master node cluster, the quorum is 2. This option is mandatory to avoid split brains.

Set the number of shards (splits) of an index (5 by default).

Set the number of replicas (additional copies) of an index (1 by default).

Cluster topology:
Cluster topology can be defined mainly with these two config. node.data and node.master.

falsetrueonly serves as master eligible node and no data will be saved there
falsefalseWorks as loadbalancer for queries and aggregations.
truetruemaster eligible node and that will save data in the location "path.data"
truefalseonly serves as data node

There is a difference between master and master eligible nodes. Setting node.master will make the node as master eligible node alone. When the cluster is started, the ES itself elects one of the node from the master eligible node to make it master node. We can get the current master node from the ES API "/_cat/nodes?v" . Any cluster related anomolies will be logged at this master node log only.

Cluster optimization for stability and performance:

  • Enable the memory locking(bootstrap.memory_lock) in elasticsearch.yml
  • Set  MAX_LOCKED_MEMORY= unlimited and ES_HEAP_SIZE(xmx and xms) with half of the memory on the server at /etc/default/elasticsearch.
  • Also configure MAX_OPEN_FILES with 16K since it wont hit its limit in the long run.


Post a Comment