Member since
03-22-2019
46
Posts
8
Kudos Received
3
Solutions
12-06-2018
09:39 PM
Introduction Performance of a cluster
running on Hadoop can be impacted by the OS partitioning. This document is
intended to understand the best practices to setup the “/var” folder/partition
with optimum size. Lets
try to approach this problem by asking some important questions.
What is “/var” used for?
How can the “/var” folder run out of disk
space?
Common issue to expect on a Hadoop cluster if
“/var” is out of disk space.
How is the current setup of “/var” in my
cluster ? Question 1 - What is
/var used for? From
OS perspective, “/var” is commonly used for constantly changing files i.e.
variable. The short form of which is “var”. Example
of such files could be the log file, mail, transient file, the printer spool,
temporary files, cached data, etc. For
example - “/var/tmp” holds the temporary files between system reboots. On any node (Hadoop or
non-Hadoop), /var directory holds content for a number of applications. It also
is used to store downloaded update packages on a temporary basis. The PackageKit update software
downloads updated packages to /var/cache/yum/ by default. /var/ partition
should be large enough to download package updates. An example of application which
uses /var is MySql, which by default uses “/var/lib/mysql” as the
MySql directory location. Question 2 - How can
/var folder run out of disk space? /var is much more susceptible to filling up - by accident
or by attack. Some of the directories which
can be affected by this is “/var/log”, “/var/tmp”, “/var/crash” etc. If there is a serious OS issue,
the logging can increase tremendously. If the disk space is set too low, like
10GB, this excessive logging can fill in the “disk” space for /var. Question 3 - Common
issue to expect on a Hadoop cluster if “/var” is out of disk space. /var
has been seen to be easily filled by a (possibly misbehaved) application, and
that if it wasn't separate from /, the filling of / could cause a kernel panic. “/var” folder has some very
important file/folders locations which are used by default by many kernel and
OS applications. For example –
“/var/run” is used for all the running
process to keep their PIDs and system information. If “/var” is full due
to low disk space configuration, then the application will fail to run.
“/var/lock” is the folder which contains
locks of the running applications for the files/devices they have locked
on. If the disk space runs out the lock is not possible and the
existing/new applications will fail.
“/var/lib” holds all the dynamic data
libraries and files for the applications. If there is no device space
left, the application will fail to work. “/var” is very important from
Hadoop perspective to keep all the service running. Running out of Disk space
on “/var” can cause Hadoop and dependent services to fail to run on that node. Question 4 - How is the
setup of “/var” in the clusters on my cluster?
Are the “Hadoop” separated from the “/var” folder location.
Are the huge sized logs or huge number of OS
logs still located on the “/var” location, example - “/var/log/messages”
and “/var/crash”.
If the Kdump is configured to capture the
crashdump logs, then risk increases, since these logs are usually huge
file sizes - sometime 100 GB or more.
The default configuration of the kdump logs
use the directory location “/var/crash”.
These days, the size of Physical Memory can
easily be 500GB ot 1TB, which would spill the kdump logs of huge size (
*note* - kdump logs can be compressed) The size of “/var” therefore plays important role
if /var/crash can be too low for saving the “crashdump” logs. If there is a OS crash (Kernel
Panic etc.) then the crashdump will never be captured complete, since the size
of “/var” is too low i.e. 10 GB or 50GB. Without the complete crashdump logs,
there can never be a complete analysis of the cause of Kernel Crash. Answer - Recommendations
on the optimum setup of “/var”. Increase the size of “/var” to 50GB at least
for all the nodes and have a uniform size across the clusters. Change the location of log file for the
“kdump”. Existing log file location is “/var/crash”. Kdump can be
configured to put the logs on any other local disk with a size of
around 300 - 500GB or as a best measure it can be dumped over
network to a remote disk. /var should by default should be separated from
the root partition. Depending on the requirement, the “/var/log” and
“/var/log/audit” can also be created as a separate partitions. /var should
be mounted on a LVM disk to allow increasing the sizes with ease if
required. All the
Hadoop Services logs should be separated from /var. The Hadoop Logs ideally
should be placed in a separate Disk. This disk should be used only for Logs
(from Hadoop and Dependent Applications Like MySql etc) and not for anything
else. This Log location should never be shared with the core Hadoop Services
like HDFS,YARN,ZOOKEEPER directory locations One way to achieve this could
be by creating a symlink of "/var/<hadoop_logs> to separate LVM
disks.
... View more
Labels:
09-21-2017
03:36 PM
The HiveServer2 and HiveMetaStore can be configured for captured the GC logs based on Timestamp. This is useful in a production cluster, where having a timestamp on the log file add clarity and also avoids overwritting. Navigate as below in Ambari: Ambari UI > Hive > Configs > Advanced hive-env > hive-env template Add following : if [ "$SERVICE" = "metastore" ]; then
export HADOOP_HEAPSIZE={{hive_metastore_heapsize}} # Setting for HiveMetastore
else
export HADOOP_HEAPSIZE={{hive_heapsize}} # Setting for HiveServer2 and Client
fi
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'`
-XX:ErrorFile=/var/log/hive/hive-metastore-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps $HADOOP_CLIENT_OPTS"
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'`
-XX:ErrorFile=/var/log/hive/hive-server2-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps $HADOOP_CLIENT_OPTS"
fi
... View more
Labels:
09-21-2017
03:31 PM
By Default, HiveServer2 and HiveMetastore does not have configuration for HeapDump on OOM. Production clusters have OOM and since the HeapDump on OOM is not configured, root cause analysis of the issue is obstructed. Navigate as below in Ambari: Ambari UI > Hive > Configs > Advanced hive-env > hive-env template Add following :
if [ "$SERVICE" = "metastore" ]; then
export HADOOP_HEAPSIZE={{hive_metastore_heapsize}} # Setting for HiveMetastore
else
export HADOOP_HEAPSIZE={{hive_heapsize}} # Setting for HiveServer2 and Client
fi
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'`
-XX:ErrorFile=/var/log/hive/hive-metastore-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive/
$HADOOP_CLIENT_OPTS"
if [ "$SERVICE" = "hiveserver2" ]; then
export HADOOP_CLIENT_OPTS="-Xmx${HADOOP_HEAPSIZE}m -Xloggc:/var/log/hive/gc.log-$SERVICE-`date +'%Y%m%d%H%M'`
-XX:ErrorFile=/var/log/hive/hive-server2-error.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive/ -XX:+PrintGCDateStamps $HADOOP_CLIENT_OPTS"
fi
... View more
Labels:
08-12-2017
02:00 AM
@abilgi I tried the above on two different Ambari versions (2.4.x and 2.5.x) with both Kerberised and Non-Kerberised environments. It does not work. The Step to Register the Remote Cluster Fails. I see following in the logs: 12 Aug 2017 01:59:09,098 ERROR [ambari-client-thread-33] BaseManagementHandler:67 - Bad request received: Failed to create new Remote Cluster HDP02. User must be Ambari or Cluster Adminstrator. 2017-08-12T01:52:39.377Z, User(admin), RemoteIp(10.42.80.140), RequestType(POST), url(http://172.26.114.132:8080/api/v1/remoteclusters/HDP02), ResultStatus(400 Bad Request), Reason(Failed to create new Remote Cluster HDP02. User must be Ambari or Cluster Adminstrator.) **Note : The user I have used is "admin" and is cluster administrator. Am I missing something? Also, what is the API way of getting this done? Is there any API way of registering a Remote Cluster?
... View more
06-28-2016
01:43 PM
3 Kudos
To remove a already installed Grafana, following needs to be done: 1. Stop the AMS service from Ambari UI. 2. Execute following curl API commands to delete Grafana: # curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop HDFS via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://<strong><ambari-server-hostname></strong>:8080/api/v1/clusters/<strong><CLUSTERNAME></strong>/services/AMBARI_METRICS/components/METRICS_GRAFANA
# curl -u admin:admin -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Stop HDFS via REST"}, "Body": {"ServiceInfo": {"state": "INSTALLED"}}}' http://<strong><ambari-server-hostname></strong>:8080/api/v1/clusters/<strong><CLUSTERNAME></strong>/services/AMBARI_METRICS/
# curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<strong><ambari-server-hostname></strong>:8080/api/v1/clusters/<strong><CLUSTERNAME></strong>/hosts/<strong><hostname_of_Grafana_host></strong>/host_components/METRICS_GRAFANA
# curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<strong><ambari-server-hostname></strong>:8080/api/v1/clusters/<strong><CLUSTERNAME></strong>/services/AMBARI_METRICS/components/METRICS_GRAFANA
# curl -u admin:admin -X GET http://<strong><ambari-server-hostname></strong>:8080/api/v1/clusters/<strong><CLUSTERNAME></strong>/services/AMBARI_METRICS/components/METRICS_GRAFANA
3. Start the AMS service from Ambari UI. Here: Replace following : <ambari-server-hostname> = FullyqualifiedDomain Nameof the node running Ambari Server <CLUSTERNAME> = Name of the Cluster <hostname_of_Grafana_host> = FullyqualifiedDomain Name of the node where Grafana has to be deleted.
... View more
Labels: