Sunday, August 26, 2018

Runbook to resolve some of the most common issues in Linux

Check the status of the particular FS by
df -ih

Check for the recently created files by entering the FS which is showing high inodes
find $1 -type f -print0 | xargs -0 stat --format '%Y :%y %n' | sort -nr | cut -d: -f2- | head

Check the directory which is having most of the files
find . -type d -print0 | xargs -0 -n1 count_files | sort -n

Check for the directories containing most of the inodes.
for i in /*; do echo $i; find $i |wc -l; done

Check the status of the Memory from the server.
free -m /free -g
Check if there is more of cache memory occupied by the server if yes then clear the cache by the following command by checking with your vertical. 
echo 3 > /proc/sys/vm/drop_caches

Check for the processes consuming most of the memory
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head

Check the list of opened files from the server.
lsof | awk '{ print $2 " " $1; }' | sort -rn | uniq -c | sort -rn | head -20

Check if the current opened files exceeds the set ulimit of the server.
Compare lsof | wc -l
cat /proc/sys/fs/file-max

Check the hardlimit and softlimit as well .
ulimit -Hn  ulimit -Sn

Check the I/O wait on server.
pidstat -d 2 5
iostat -txk 5

Check if the inodes are not full on the server.
for i in /*; do echo $i; find $i |wc -l; done
df -ih

Check dmesg to see what is performing block read / writes or dirtying inodes

Also check nofile limit in limits.conf, a process could be requesting more files than it is permitted to open.

Check the status of CPU and check the load average.

Use pstree to look for any suspicious processes or unusually high number of a particular service. You can compare the process listing with a similarly loaded server to do a quick check.

Use netstat to look for any suspicious connections, or too many connections from one particular IP

Check for the maximum no of processes consuming the CPU.
ps -eo pcpu,pid,user,args | sort -k 1 -r | head

Check the status of the mentioned file system for free space
df -h

For detailed analysis, check with du command
du -sh * | sort -hr | head -n10

Check for large files that are open but are deleted from file system.
lsof -nP | grep '(deleted)'

Check if system Logs or Nginx logs are taking much of the space then run logrotate
logrotate -f /etc/logrotate.d/nginx 

check the health status of the elasticsearch cluster, by doing an API call.
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

if the status is yellow then
All primary shards are allocated, but at least one replica is missing. No data is missing, so search results will still be complete. However, your high availability is compromised to some degree. If more shards disappear, you might lose data. Think of yellow as a warning that should prompt investigation

If the status is Red then
At least one primary shard (and all of its replicas) is missing. This means that you are missing data: searches will return partial results, and indexing into that shard will return an exception


Post a Comment