About jsensharma

jsensharma · ‎12-21-2016

Sometime back i reported this issue: https://issues.apache.org/jira/browse/AMBARI-18932 And also the workaround.

jsensharma · ‎12-21-2016

@chennuri gouri shankar The following link has the details: https://cwiki.apache.org/confluence/display/RANGER/REST+APIs+for+Policy+Management . Example: curl -u $RangerAdmin:$Password -X GET http://$RANGERHOST:6080/service/public/api/policy/ Above will list the policies then you can navigate to the desired policy make changes/post/delete

jsensharma · ‎12-21-2016

priyanshu bindal Can you try adding the ZKFC on that host using the ambari APIs? Example: curl --user admin:admin -i -X POST http://erie1.example.com:8080/api/v1/clusters/ErieCluster/hosts/erie2.example.com/host_components/ZKFC - Here erie1.example.com = Ambari Host Name ErieCluster = It is the cluster Name erie2.example.com = It is the Host name in which we want to install the ZKFC . - You can find more details on how to add host component using Ambari APIs by referring to the following link: https://cwiki.apache.org/confluence/display/AMBARI/Add+a+host+and+deploy+components+using+APIs .

jsensharma · ‎12-21-2016

David DN - Are you using the default FileView? Or have you created a new instance of FileView and testing it? - Can you please share the view configuration (just to see if there is anything special in it) - Also can you please share the output of the following command. Here the URL should be pointing to the View instance that you are trying to hit. curl --user admin:admin -i -H 'X-Requested-By: ambari' -X GET http://$AMBARI_HOSTNAME:8080/api/v1/views/FILES/versions/1.0.0/instances/FILES_NEW_INSTANCE - Which version of Ambari Views /server are you using ? .

jsensharma · ‎12-20-2016

@ARUN Basically there are 2 errors: 1). Address already in use (port conflict) Caused by: java.net.BindException: Problem binding to [0.0.0.0:60200] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException For that please check which process is consuming that port and if there is a port conflict then either change the port or Kill the other process that is consuming that port. . 2). For the second error , Looks like a Data Corruption. I will suggest clear old AMS data. https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data Caused by: java.io.InterruptedIOException: Interrupted calling coprocessor service org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for row \x00\x00METRIC_RECORD at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1769) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1719) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1022) - Shut down AMS and then Clear out the "/var/lib/ambari-metrics-collector" dir for fresh restart: - From Ambari -> Ambari Metrics -> Config -> Advanced ams-hbase-site get the "hbase.rootdir" and "hbase-tmp" directory - Delete or Move the hbase-tmp and hbase.rootdir directories to an archive folder - Then Re-Started AMS.

jsensharma · ‎12-20-2016

@Sujatha Veeswar The error that you are getting indicates that you have some DB inconsistency. Your issue looks somewhat related to : https://issues.apache.org/jira/browse/AMBARI-18822 Which should be addressed in Ambari 2.5 As a temporary remedy you can try starting the Ambari server as following: ambari-server start --skip-database-check DB needs to be checked for the inconsistency fix. .

jsensharma · ‎12-19-2016

Mahen Jay You need at least two NameNodes in order to achieve the NameNode HA. In the screenshot i see that you have only one NameNode. As you mentioned that you were not able to add the Secondary Name Node earlier. Can you please share what issue you were facing while adding Secondary NameNode earlier?

jsensharma · ‎12-17-2016

Many times we find that the operations (like start/stop/restart ...etc) are failing from ambari UI. In such cases if we want to troubleshoot what Ambari UI did to perform that operation or How the commands were executed. Then we can manually execute those same operations from the individual host with the help of "/var/lib/ambari-agent/data/command-xxx.json" file. - When we perform any operation from ambari UI like (Starting / Stopping Datanode) then we will notice that ambari shows the operation progress in the UI. There we can see basically the following two files. Example: stderr: /var/lib/ambari-agent/data/errors-952.txt stdout: /var/lib/ambari-agent/data/output-952.txt - Apart from the above files there is one more important file which ambari agent uses to execute the instructions/commands that are sent by the AmbariServer. We can find that specific file in the ambari-agent's "/var/lib/ambari-agent/data/command-xxx.json" file. /var/lib/ambari-agent/data/command-952.json - Here the "command-xxx.json" file has the command ID (xxx) same as the "errors-xxx.txt" & "output-xxx.txt" (as command-952.json, errors-952.txt, output-952.txt) - The "command-xxx.json" file contains lots of information's in it specially the "localComponents", "configuration_attributes", "configurationTags" and the command type. In this file we can find the data snippet something like following: }, "public_hostname": "c6402.ambari.apache.org", "commandId": "53-0", "hostname": "c6402.ambari.apache.org", "kerberosCommandParams": [], "serviceName": "HDFS", "role": "DATANODE", "forceRefreshConfigTagsBeforeExecution": false, "requestId": 53, "agentConfigParams": { "agent": { "parallel_execution": 0 } }, "clusterName": "ClusterDemo", "commandType": "EXECUTION_COMMAND", "taskId": 952, "roleParams": { "component_category": "SLAVE" }, "conf . . How to execute the same command from the host ("c6402.ambari.apache.org") where the operation was actually performed? . Step-1). ======= Login to the host in which the command was executed. Here it is "c6402.ambari.apache.org" which we can see in the ambari UI operations history. While stopping DataNode. ssh root@c6402.ambari.apache.org Step-2). ======= As in the operation we were performing DataNode Start operation hence we will be executing the "datanode.py" script. As following: [root@c6402 ambari-agent]# PATH=$PATH:/var/lib/ambari-agent/ [root@c6402 ambari-agent]# python2.6 /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py START /var/lib/ambari-agent/data/command-952.json /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package /tmp/Jay/tmp.txt ERROR /tmp/Jay/ . ** NOTICE: ** Here we have modified the PATH variable temporarily, Because if we will not edit it then while running the above command we might see the following error: We will need to set the PATH just to make sure that when we will try to execute the commands we have the "ambari-sudo.sh" present in the PATH. Else we might see the following kind of error while trying to execute the commands. This is because ambari agent executes these commands with the help of "" script. So that script must be available in the PATH. Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 174, in <module> DataNode().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 58, in start import params File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/params.py", line 25, in <module> from params_linux import * File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/params_linux.py", line 20, in <module> import status_params File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/status_params.py", line 53, in <module> hadoop_conf_dir = conf_select.get_hadoop_conf_dir() File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/conf_select.py", line 477, in get_hadoop_conf_dir select(stack_name, "hadoop", version) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/conf_select.py", line 315, in select shell.checked_call(_get_cmd("set-conf-dir", package, version), logoutput=False, quiet=False, sudo=True) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'ambari-python-wrap /usr/bin/conf-select set-conf-dir --package hadoop --stack-version 2.5.0.0-1245 --conf-version 0' returned 127. /bin/bash: ambari-sudo.sh: command not found . - Here the "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py" script will have the following arguments: Script expects at least 6 arguments Usage: datanode.py <COMMAND> <JSON_CONFIG> <BASEDIR> <STROUTPUT> <LOGGING_LEVEL> <TMP_DIR> <COMMAND> command type (INSTALL/CONFIGURE/START/STOP/SERVICE_CHECK...) <JSON_CONFIG> path to command json file. Ex: /var/lib/ambari-agent/data/command-2.json <BASEDIR> path to service metadata dir. Ex: /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package <STROUTPUT> path to file with structured command output (file will be created). Ex:/tmp/my.txt <LOGGING_LEVEL> log level for stdout. Ex:DEBUG,INFO <TMP_DIR> temporary directory for executable scripts. Ex: /var/lib/ambari-agent/tmp . - Once we have executed the above commands then we can see that the DataNode is started exactly the same way how we start it from Ambari UI. It also helps us in troubleshooting if ambari-server & agent were not communicating well and to isolate the issue. Example: (OUTPUT) ================= [root@c6402 Jay]# python2.6 /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py START /var/lib/ambari-agent/data/command-952.json /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package /tmp/Jay/tmp.txt DEBUG /tmp/Jay/ 2016-12-17 08:35:27,489 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.5.0.0-1245 2016-12-17 08:35:27,489 - Checking if need to create versioned conf dir /etc/hadoop/2.5.0.0-1245/0 2016-12-17 08:35:27,489 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1} 2016-12-17 08:35:27,506 - call returned (1, '/etc/hadoop/2.5.0.0-1245/0 exist already', '') 2016-12-17 08:35:27,507 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False} 2016-12-17 08:35:27,523 - checked_call returned (0, '') 2016-12-17 08:35:27,523 - Ensuring that hadoop has the correct symlink structure 2016-12-17 08:35:27,524 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2016-12-17 08:35:27,529 - Stack Feature Version Info: stack_version=2.5, version=2.5.0.0-1245, current_cluster_version=2.5.0.0-1245 -> 2.5.0.0-1245 2016-12-17 08:35:27,530 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.5.0.0-1245 2016-12-17 08:35:27,531 - Checking if need to create versioned conf dir /etc/hadoop/2.5.0.0-1245/0 2016-12-17 08:35:27,531 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1} 2016-12-17 08:35:27,548 - call returned (1, '/etc/hadoop/2.5.0.0-1245/0 exist already', '') 2016-12-17 08:35:27,549 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False} 2016-12-17 08:35:27,568 - checked_call returned (0, '') 2016-12-17 08:35:27,568 - Ensuring that hadoop has the correct symlink structure 2016-12-17 08:35:27,568 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2016-12-17 08:35:27,574 - checked_call['rpm -q --queryformat '%{version}-%{release}' hdp-select | sed -e 's/\.el[0-9]//g''] {'stderr': -1} 2.5.0.0-12452016-12-17 08:35:27,588 - checked_call returned (0, '2.5.0.0-1245', '') 2016-12-17 08:35:27,591 - Directory['/etc/security/limits.d'] {'owner': 'root', 'create_parents': True, 'group': 'root'} 2016-12-17 08:35:27,597 - File['/etc/security/limits.d/hdfs.conf'] {'content': Template('hdfs.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644} 2016-12-17 08:35:27,599 - XmlConfig['hadoop-policy.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...} 2016-12-17 08:35:27,606 - Generating config: /usr/hdp/current/hadoop-client/conf/hadoop-policy.xml 2016-12-17 08:35:27,607 - File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'} 2016-12-17 08:35:27,615 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...} 2016-12-17 08:35:27,622 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-client.xml 2016-12-17 08:35:27,622 - File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'} 2016-12-17 08:35:27,631 - Directory['/usr/hdp/current/hadoop-client/conf/secure'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'cd_access': 'a'} 2016-12-17 08:35:27,632 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf/secure', 'configuration_attributes': {}, 'configurations': ...} 2016-12-17 08:35:27,639 - Generating config: /usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml 2016-12-17 08:35:27,639 - File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'} 2016-12-17 08:35:27,644 - XmlConfig['ssl-server.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...} 2016-12-17 08:35:27,651 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-server.xml 2016-12-17 08:35:27,652 - File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'} 2016-12-17 08:35:27,658 - XmlConfig['hdfs-site.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {'final': {'dfs.datanode.failed.volumes.tolerated': 'true', 'dfs.datanode.data.dir': 'true', 'dfs.namenode.name.dir': 'true', 'dfs.support.append': 'true', 'dfs.webhdfs.enabled': 'true'}}, 'configurations': ...} 2016-12-17 08:35:27,665 - Generating config: /usr/hdp/current/hadoop-client/conf/hdfs-site.xml 2016-12-17 08:35:27,666 - File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'} 2016-12-17 08:35:27,715 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'mode': 0644, 'configuration_attributes': {'final': {'fs.defaultFS': 'true'}}, 'owner': 'hdfs', 'configurations': ...} 2016-12-17 08:35:27,721 - Generating config: /usr/hdp/current/hadoop-client/conf/core-site.xml 2016-12-17 08:35:27,721 - File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'} 2016-12-17 08:35:27,738 - File['/usr/hdp/current/hadoop-client/conf/slaves'] {'content': Template('slaves.j2'), 'owner': 'hdfs'} 2016-12-17 08:35:27,739 - Directory['/var/lib/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'group': 'hadoop', 'mode': 0751} 2016-12-17 08:35:27,740 - Directory['/var/lib/ambari-agent/data/datanode'] {'create_parents': True, 'mode': 0755} 2016-12-17 08:35:27,744 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/boot', '/proc/sys/fs/binfmt_misc', '/var/lib/nfs/rpc_pipefs']. 2016-12-17 08:35:27,744 - Mount point for directory /hadoop/hdfs/data is / 2016-12-17 08:35:27,744 - Mount point for directory /hadoop/hdfs/data is / 2016-12-17 08:35:27,744 - Last mount for /hadoop/hdfs/data in the history file is / 2016-12-17 08:35:27,744 - Will manage /hadoop/hdfs/data since it's on the same mount point: / 2016-12-17 08:35:27,745 - Forcefully ensuring existence and permissions of the directory: /hadoop/hdfs/data 2016-12-17 08:35:27,745 - Directory['/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'} 2016-12-17 08:35:27,749 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/boot', '/proc/sys/fs/binfmt_misc', '/var/lib/nfs/rpc_pipefs']. 2016-12-17 08:35:27,749 - Mount point for directory /hadoop/hdfs/data is / 2016-12-17 08:35:27,749 - File['/var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist'] {'content': '\n# This file keeps track of the last known mount-point for each dir.\n# It is safe to delete, since it will get regenerated the next time that the component of the service starts.\n# However, it is not advised to delete this file since Ambari may\n# re-create a dir that used to be mounted on a drive but is now mounted on the root.\n# Comments begin with a hash (#) symbol\n# dir,mount_point\n/hadoop/hdfs/data,/\n', 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644} 2016-12-17 08:35:27,750 - Directory['/var/run/hadoop'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0755} 2016-12-17 08:35:27,751 - Directory['/var/run/hadoop/hdfs'] {'owner': 'hdfs', 'group': 'hadoop', 'create_parents': True} 2016-12-17 08:35:27,752 - Directory['/var/log/hadoop/hdfs'] {'owner': 'hdfs', 'group': 'hadoop', 'create_parents': True} 2016-12-17 08:35:27,752 - File['/var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'] {'action': ['delete'], 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'} 2016-12-17 08:35:27,760 - Skipping File['/var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'] due to not_if 2016-12-17 08:35:27,761 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'} 2016-12-17 08:35:27,767 - Skipping Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode''] due to not_if 2016-12-17 08:35:27,796 - Command: /usr/bin/hdp-select status hadoop-hdfs-datanode > /tmp/tmp6ogtMa Output: hadoop-hdfs-datanode - 2.5.0.0-1245 . .

jsensharma · ‎12-17-2016

[Related Article On Ambari Server Tuning : https://community.hortonworks.com/articles/131670/ambari-server-performance-tuning-troubleshooting-c.html ] -The jcmd utility comes with the JDK and is present inside the "$JAVA_HOME/bin". It is used to send diagnostic command requests to the JVM, where these requests are useful for controlling Java Flight Recordings, troubleshoot, and diagnose JVM and Java Applications. Following are the conditions for using this utility. - 1. It must be used on the same machine where the JVM is running - 2. Only a user who own the JVM process can connect to is using this utility. This utility can help us in getting many details about the JVM process. Some of the most useful information's are as following: Syntax: jcmd $PID $ARGUMENT Example1: Classes taking the most memory are listed at the top, and classes are listed in a descending order. /usr/jdk64/jdk1.8.0_60/bin/jcmd $PID GC.class_histogram > /tmp/22421_ClassHistogram.txt Example2: Generate Heap Dump /usr/jdk64/jdk1.8.0_60/bin/jcmd $PID GC.heap_dump /tmp/test123.hprof Example3: Explicitly request JVM to trigger a Garbage Collection Cycle. /usr/jdk64/jdk1.8.0_60/bin/jcmd $PID GC.run Example4: Generate Thread dump. usr/jdk64/jdk1.8.0_60/bin/jcmd $PID Thread.print Example5: List JVM properties. /usr/jdk64/jdk1.8.0_60/bin/jcmd $PID VM.system_properties Example6: The Command line options along with the CLASSPATH setting. /usr/jdk64/jdk1.8.0_60/bin/jcmd $ PID VM.command_line **NOTE:** To use few specific features offered by "jcmd" tool the "-XX:+UnlockDiagnosticVMOptions" JVM option need to be enabled. . When to Collect Thread Dumps? --------------------------------------------------------- Here now we will see a very common scenario when we find that the JVM process is talking a lots of time in processing the request. Many times we see that the JVM process is stuck/slow or completely Hung. In such scenario in order to investigate the root cause of the slowness we need to collect the thread dumps of the JVM process which will tell us about the various activities those threads are actually performing. Sometimes some threads are involved in some very high CPU intensive operations which also might cause a slowness in getting the response. So We should collect the thread dump as well as the CPU data using "top" command. Few things to consider while collecting the thread dumps: - 1. Collect the thread dump when we see the issue (slowness, stuck/ hung scenario ...etc). . - 2. Mostly a single thread dump is not very useful. So whenever we collect the thread dump then we should collect at least 5-6 thread dumps. In some interval like collect 5-6 thread dumps in 10 seconds interval. Like that we will get around 5-6 thread dumps in 1 minute. - 3. If we are also investigating that few threads might be consuming high CPU cycles then in order to find the APIs that are actually consuming the high CPU we must collect the Thread dump as well as the "top" command output data almost at the same time. . - In order to make this easy we can use a simple script "threaddump_cpu_with_cmd.sh" and use it for out troubleshooting & JVM data collection. The following script can be downloaded from: https://github.com/jaysensharma/MiddlewareMagicDemos/tree/master/HDP_Ambari/JVM #!/bin/sh # Takes the JavaApp PID as an argument. # Make sure you set JAVA_HOME # Create thread dumps a specified number of times (i.e. LOOP) and INTERVAL. # Thread dumps will be collected in the file "jcmd_threaddump.out", in the same directory from where this script is been executed. # Usage: # sudo - $user_Who_Owns_The_JavaProcess # ./threaddump_cpu_with_cmd.sh <JAVA_APP_PID> # # # Example: # NameNode PID is "5752" and it is started by user "hdfs" then run this utility as following: # # su -l hdfs -c "/tmp/threaddump_cpu_with_cmd.sh 5752" ################################################################################################ # Number of times to collect data. Means total number of thread dumps. LOOP=10 # Interval in seconds between data points. INTERVAL=10 # Where to generate the threddump & top output files. WHERE_TO_GENERATE_OUTPUT_FILES="/tmp" # Setting the Java Home, by giving the path where your JDK is kept # USERS MUST SET THE JAVA_HOME before running this scripta s following: JAVA_HOME=/usr/jdk64/jdk1.8.0_60 echo "Writing CPU data log files to Directory: $WHERE_TO_GENERATE_OUTPUT_FILES" for ((i=1; i <= $LOOP; i++)) do #$JAVA_HOME/bin/jstack -l $1 >> jstack_threaddump.out $JAVA_HOME/bin/jcmd $1 Thread.print >> $WHERE_TO_GENERATE_OUTPUT_FILES/jcmd_threaddump.out _now=$(date) echo "${_now}" >> $WHERE_TO_GENERATE_OUTPUT_FILES/top_highcpu.out top -b -n 1 -H -p $1 >> $WHERE_TO_GENERATE_OUTPUT_FILES/top_highcpu.out echo "Collected 'top' output and Thread Dump #" $i if [ $i -lt $LOOP ]; then echo "Sleeping for $INTERVAL seconds." sleep $INTERVAL fi done - Get the file "jcmd_threaddump.out" and "top_highcpu.out" for analysis. . How to analyze the Thread dump Data? --------------------------------------------------------- You may have a look at one of my old blog article which explains "High CPU Utilization Finding Cause?" http://middlewaremagic.com/weblogic/?p=4884 . . Common Errors with "jcmd" utility. --------------------------------------------------------- While running the JCMD we might see the below mentioned error. Here the "5752" is the NameNode PID. [root@c6401 keys]# /usr/jdk64/jdk1.8.0_60/bin/jcmd 5752 help 5752: com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106) at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:63) at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:208) at sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:147) at sun.tools.jcmd.JCmd.main(JCmd.java:131) This error occurred because, JCMD utility allows to connect only to the JVM process that we own the process. In this case we see that the "NameNode" process is being owned by the "hdfs" user where as in the above command we are trying to connect to the NameNode process Via "jcmd" utility as "root" user. The root user here does not own the process, Hence we see the error. - "hdfs" user owned process # su -l hdfs -c "/usr/jdk64/jdk1.8.0_60/bin/jcmd -l" 5752 org.apache.hadoop.hdfs.server.namenode.NameNode 5546 org.apache.hadoop.hdfs.tools.DFSZKFailoverController 5340 org.apache.hadoop.hdfs.server.datanode.DataNode 4991 org.apache.hadoop.hdfs.qjournal.server.JournalNode . - "root" user owned process [root@c6401 keys]# /usr/jdk64/jdk1.8.0_60/bin/jcmd -l 1893 com.hortonworks.support.tools.server.SupportToolServer 6470 com.hortonworks.smartsense.activity.ActivityAnalyzerFacade 16774 org.apache.ambari.server.controller.AmbariServer 29100 sun.tools.jcmd.JCmd -l 6687 org.apache.zeppelin.server.ZeppelinServer More information about this utility can be found at: https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr006.html . .

jsensharma · ‎08-05-2016

As we know that the logs are important however many times we see that the logs consumes a lots of disk space. So if we want to use the log4j provided approach to control the logging behavior in a much efficient way, then in that case we can take advantage of the "apache-log4j-extra" package. More information about the extra packages can be found in : https://logging.apache.org/log4j/extras/ In this article we will see how to use the log compression (zip) feature of log4j to compress the NameNode logs on a daily basis automatically. In order to achieve the same we will need the following Steps: Step-1). As the extras features are not shipped with the default log4j implementation hence the users will need to download the "Apache Extras™ for Apache log4j" (Like: apache-log4j-extras) : https://logging.apache.org/log4j/extras/download.html Example: For example download the jar "apache-log4j-extras-1.2.17.jar" and place it inside the Hadoop library location. /usr/hdp/2.4.2.0-258/hadoop/lib/apache-log4j-extras-1.2.17.jar Step-2). Create a log4j appender like "ZIPRFA" using class "org.apache.log4j.rolling.RollingFileAppender" where we will define the "rollingPolicy". For more information about various Rolling Policies users can refer to : https://logging.apache.org/log4j/extras/apidocs/org/apache/log4j/rolling/ - Login to Ambari and then in the HDFS advanced configuration, Add the following Appender in the "Advanced hdfs-log4j" some where at the end. #### New Appender to Zip the Log Files Based on Daily Rotation log4j.appender.ZIPRFA=org.apache.log4j.rolling.RollingFileAppender log4j.appender.ZIPRFA.File=${hadoop.log.dir}/${hadoop.log.file} log4j.appender.ZIPRFA.layout=org.apache.log4j.PatternLayout log4j.appender.ZIPRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n log4j.appender.ZIPRFA.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy log4j.appender.ZIPRFA.rollingPolicy.ActiveFileName=${hadoop.log.dir}/${hadoop.log.file} log4j.appender.ZIPRFA.rollingPolicy.FileNamePattern=${hadoop.log.dir}/${hadoop.log.file}-.%d{yyyyMMdd}.log.gz Step-3). Also we will need to make sure that the NameNode should use the above mentioned Appender then we will need to add the "HADOOP_NAMENODE_OPTS" to include the "-Dhadoop.root.logger=INFO,ZIPRFA" something like following: export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS} -Dhadoop.root.logger=INFO,ZIPRFA" Step-4). Now Restart the NameNode and double check sure that the "-Dhadoop.root.logger=INFO,ZIPRFA" property is added properly somewhere at the end. We can confirm the same using the "ps -ef | grep NameNode" output hdfs 27497 1 3 07:07 ? 00:01:27 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_namenode -Xmx1024m -Dhdp.version=2.4.2.0-258 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.4.2.0-258/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.2.0-258 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-jss1.openstacklocal.log -Dhadoop.home.dir=/usr/hdp/2.4.2.0-258/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode Step-5). Now as soon as the date changes we should be able to see that the old NameNode log file got zipped as following: [root@jayhost hdfs]# /var/log/hadoop/hdfs [root@jayhost hdfs]# ls -lart *.gz -rw-r--r--. 1 hdfs hadoop 32453 Aug 5 06:32 hadoop-hdfs-namenode-jayhost.openstacklocal.log-.20160804.log.gz

Member Since	‎03-14-2016 01:07 PM
Last Visited
Posts	4,721
Kudos received	1096

Cloudera Community

Re: set Variable in ambari rest API

Re: how to stop Hive Metastore and HiveServer2 by...

Re: how to verify by ambari api the active/standby...

Re: Curl throws error when running allow snapshot

Re: ambari server + REASON: Server not yet listeni...

Re: Ambari UI shows "Add Service Wizard In Progres...

Re: Ranger Policies using API

Re: hi, I deleted ZKFailOverController service fro...

Re: ServiceFormattedException in using the files v...

Re: ambari metrics collector going down

Re: Ambari services couldn't be started

Re: secondary name node wizard option

Troubleshooting ambari operation execution using "...

How to collect threaddump using jcmd and analyse i...

How to rotate as well as zip the NameNode logs usi...