Member since
03-14-2016
4721
Posts
1111
Kudos Received
874
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2018 | 04-27-2020 03:48 AM | |
3991 | 04-26-2020 06:18 PM | |
3228 | 04-26-2020 06:05 PM | |
2580 | 04-13-2020 08:53 PM | |
3836 | 03-31-2020 02:10 AM |
12-21-2016
07:15 AM
1 Kudo
Sometime back i reported this issue: https://issues.apache.org/jira/browse/AMBARI-18932 And also the workaround.
... View more
12-21-2016
06:58 AM
@chennuri gouri shankar The following link has the details: https://cwiki.apache.org/confluence/display/RANGER/REST+APIs+for+Policy+Management . Example: curl -u $RangerAdmin:$Password -X GET http://$RANGERHOST:6080/service/public/api/policy/
Above will list the policies then you can navigate to the desired policy make changes/post/delete
... View more
12-21-2016
06:53 AM
priyanshu bindal Can you try adding the ZKFC on that host using the ambari APIs? Example: curl --user admin:admin -i -X POST http://erie1.example.com:8080/api/v1/clusters/ErieCluster/hosts/erie2.example.com/host_components/ZKFC - Here
erie1.example.com = Ambari Host Name
ErieCluster = It is the cluster Name
erie2.example.com = It is the Host name in which we want to install the ZKFC . - You can find more details on how to add host component using Ambari APIs by referring to the following link:
https://cwiki.apache.org/confluence/display/AMBARI/Add+a+host+and+deploy+components+using+APIs .
... View more
12-21-2016
03:02 AM
David DN - Are you using the default FileView? Or have you created a new instance of FileView and testing it? - Can you please share the view configuration (just to see if there is anything special in it) - Also can you please share the output of the following command. Here
the URL should be pointing to the View instance that you are trying to
hit. curl --user admin:admin -i -H 'X-Requested-By: ambari' -X GET http://$AMBARI_HOSTNAME:8080/api/v1/views/FILES/versions/1.0.0/instances/FILES_NEW_INSTANCE - Which version of Ambari Views /server are you using ? .
... View more
12-20-2016
05:30 PM
@ARUN Basically there are 2 errors: 1). Address already in use (port conflict) Caused by: java.net.BindException: Problem binding to [0.0.0.0:60200] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException
For that please check which process is consuming that port and if there is a port conflict then either change the port or Kill the other process that is consuming that port. . 2). For the second error , Looks like a Data Corruption. I will suggest clear old AMS data.
https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data Caused by: java.io.InterruptedIOException: Interrupted calling coprocessor service org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for row \x00\x00METRIC_RECORD
at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1769)
at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1719)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1022) - Shut down AMS and then Clear out the "/var/lib/ambari-metrics-collector" dir for fresh restart: - From Ambari -> Ambari Metrics -> Config -> Advanced ams-hbase-site get the "hbase.rootdir" and "hbase-tmp" directory
- Delete or Move the hbase-tmp and hbase.rootdir directories to an archive folder - Then Re-Started AMS.
... View more
12-20-2016
06:10 AM
1 Kudo
@Sujatha Veeswar The error that you are getting indicates that you have some DB inconsistency. Your issue looks somewhat related to : https://issues.apache.org/jira/browse/AMBARI-18822 Which should be addressed in Ambari 2.5 As a temporary remedy you can try starting the Ambari server as following: ambari-server start --skip-database-check DB needs to be checked for the inconsistency fix. .
... View more
12-19-2016
02:50 PM
1 Kudo
Mahen Jay You need at least two NameNodes in order to achieve the NameNode HA. In the screenshot i see that you have only one NameNode. As you mentioned that you were not able to add the Secondary Name Node earlier. Can you please share what issue you were facing while adding Secondary NameNode earlier?
... View more
12-17-2016
01:11 PM
2 Kudos
Many times we find that the operations (like start/stop/restart ...etc) are failing from ambari UI. In such cases if we want to troubleshoot what Ambari UI did to perform that operation or How the commands were executed. Then we can manually execute those same operations from the individual host with the help of "/var/lib/ambari-agent/data/command-xxx.json" file. - When we perform any operation from ambari UI like (Starting / Stopping Datanode) then we will notice that ambari shows the operation progress in the UI. There we can see basically the following two files. Example: stderr: /var/lib/ambari-agent/data/errors-952.txt
stdout: /var/lib/ambari-agent/data/output-952.txt - Apart from the above files there is one more important file which ambari agent uses to execute the instructions/commands that are sent by the AmbariServer. We can find that specific file in the ambari-agent's "/var/lib/ambari-agent/data/command-xxx.json" file. /var/lib/ambari-agent/data/command-952.json
- Here the "command-xxx.json" file has the command ID (xxx) same as the "errors-xxx.txt" & "output-xxx.txt" (as command-952.json, errors-952.txt, output-952.txt)
- The "command-xxx.json" file contains lots of information's in it specially the "localComponents", "configuration_attributes", "configurationTags" and the command type. In this file we can find the data snippet something like following: },
"public_hostname": "c6402.ambari.apache.org",
"commandId": "53-0",
"hostname": "c6402.ambari.apache.org",
"kerberosCommandParams": [],
"serviceName": "HDFS",
"role": "DATANODE",
"forceRefreshConfigTagsBeforeExecution": false,
"requestId": 53,
"agentConfigParams": {
"agent": {
"parallel_execution": 0
}
},
"clusterName": "ClusterDemo",
"commandType": "EXECUTION_COMMAND",
"taskId": 952,
"roleParams": {
"component_category": "SLAVE"
},
"conf . . How to execute the same command from the host ("c6402.ambari.apache.org") where the operation was actually performed? . Step-1).
======= Login to the host in which the command was executed. Here it is "c6402.ambari.apache.org" which we can see in the ambari UI operations history. While stopping DataNode. ssh root@c6402.ambari.apache.org Step-2).
======= As in the operation we were performing DataNode Start operation hence we will be executing the "datanode.py" script. As following: [root@c6402 ambari-agent]# PATH=$PATH:/var/lib/ambari-agent/
[root@c6402 ambari-agent]# python2.6 /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py START /var/lib/ambari-agent/data/command-952.json /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package /tmp/Jay/tmp.txt ERROR /tmp/Jay/
.
** NOTICE: ** Here we have modified the PATH variable temporarily, Because if we will not edit it then while running the above command we might see the following error: We will need to set the PATH just to make sure that when we will try to execute the commands we have the "ambari-sudo.sh" present in the PATH. Else we might see the following kind of error while trying to execute the commands. This is because ambari agent executes these commands with the help of "" script. So that script must be available in the PATH. Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 174, in <module>
DataNode().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 58, in start
import params
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/params.py", line 25, in <module>
from params_linux import *
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/params_linux.py", line 20, in <module>
import status_params
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/status_params.py", line 53, in <module>
hadoop_conf_dir = conf_select.get_hadoop_conf_dir()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/conf_select.py", line 477, in get_hadoop_conf_dir
select(stack_name, "hadoop", version)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/conf_select.py", line 315, in select
shell.checked_call(_get_cmd("set-conf-dir", package, version), logoutput=False, quiet=False, sudo=True)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-python-wrap /usr/bin/conf-select set-conf-dir --package hadoop --stack-version 2.5.0.0-1245 --conf-version 0' returned 127. /bin/bash: ambari-sudo.sh: command not found . - Here the "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py" script will have the following arguments: Script expects at least 6 arguments
Usage: datanode.py <COMMAND> <JSON_CONFIG> <BASEDIR> <STROUTPUT> <LOGGING_LEVEL> <TMP_DIR>
<COMMAND> command type (INSTALL/CONFIGURE/START/STOP/SERVICE_CHECK...)
<JSON_CONFIG> path to command json file. Ex: /var/lib/ambari-agent/data/command-2.json
<BASEDIR> path to service metadata dir. Ex: /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package
<STROUTPUT> path to file with structured command output (file will be created). Ex:/tmp/my.txt
<LOGGING_LEVEL> log level for stdout. Ex:DEBUG,INFO
<TMP_DIR> temporary directory for executable scripts. Ex: /var/lib/ambari-agent/tmp . - Once we have executed the above commands then we can see that the DataNode is started exactly the same way how we start it from Ambari UI. It also helps us in troubleshooting if ambari-server & agent were not communicating well and to isolate the issue. Example: (OUTPUT)
================= [root@c6402 Jay]# python2.6 /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py START /var/lib/ambari-agent/data/command-952.json /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package /tmp/Jay/tmp.txt DEBUG /tmp/Jay/
2016-12-17 08:35:27,489 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.5.0.0-1245
2016-12-17 08:35:27,489 - Checking if need to create versioned conf dir /etc/hadoop/2.5.0.0-1245/0
2016-12-17 08:35:27,489 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-12-17 08:35:27,506 - call returned (1, '/etc/hadoop/2.5.0.0-1245/0 exist already', '')
2016-12-17 08:35:27,507 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-12-17 08:35:27,523 - checked_call returned (0, '')
2016-12-17 08:35:27,523 - Ensuring that hadoop has the correct symlink structure
2016-12-17 08:35:27,524 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-12-17 08:35:27,529 - Stack Feature Version Info: stack_version=2.5, version=2.5.0.0-1245, current_cluster_version=2.5.0.0-1245 -> 2.5.0.0-1245
2016-12-17 08:35:27,530 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.5.0.0-1245
2016-12-17 08:35:27,531 - Checking if need to create versioned conf dir /etc/hadoop/2.5.0.0-1245/0
2016-12-17 08:35:27,531 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-12-17 08:35:27,548 - call returned (1, '/etc/hadoop/2.5.0.0-1245/0 exist already', '')
2016-12-17 08:35:27,549 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-12-17 08:35:27,568 - checked_call returned (0, '')
2016-12-17 08:35:27,568 - Ensuring that hadoop has the correct symlink structure
2016-12-17 08:35:27,568 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-12-17 08:35:27,574 - checked_call['rpm -q --queryformat '%{version}-%{release}' hdp-select | sed -e 's/\.el[0-9]//g''] {'stderr': -1}
2.5.0.0-12452016-12-17 08:35:27,588 - checked_call returned (0, '2.5.0.0-1245', '')
2016-12-17 08:35:27,591 - Directory['/etc/security/limits.d'] {'owner': 'root', 'create_parents': True, 'group': 'root'}
2016-12-17 08:35:27,597 - File['/etc/security/limits.d/hdfs.conf'] {'content': Template('hdfs.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644}
2016-12-17 08:35:27,599 - XmlConfig['hadoop-policy.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-12-17 08:35:27,606 - Generating config: /usr/hdp/current/hadoop-client/conf/hadoop-policy.xml
2016-12-17 08:35:27,607 - File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,615 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-12-17 08:35:27,622 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-client.xml
2016-12-17 08:35:27,622 - File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,631 - Directory['/usr/hdp/current/hadoop-client/conf/secure'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'cd_access': 'a'}
2016-12-17 08:35:27,632 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf/secure', 'configuration_attributes': {}, 'configurations': ...}
2016-12-17 08:35:27,639 - Generating config: /usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml
2016-12-17 08:35:27,639 - File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,644 - XmlConfig['ssl-server.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-12-17 08:35:27,651 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-server.xml
2016-12-17 08:35:27,652 - File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,658 - XmlConfig['hdfs-site.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {'final': {'dfs.datanode.failed.volumes.tolerated': 'true', 'dfs.datanode.data.dir': 'true', 'dfs.namenode.name.dir': 'true', 'dfs.support.append': 'true', 'dfs.webhdfs.enabled': 'true'}}, 'configurations': ...}
2016-12-17 08:35:27,665 - Generating config: /usr/hdp/current/hadoop-client/conf/hdfs-site.xml
2016-12-17 08:35:27,666 - File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,715 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'mode': 0644, 'configuration_attributes': {'final': {'fs.defaultFS': 'true'}}, 'owner': 'hdfs', 'configurations': ...}
2016-12-17 08:35:27,721 - Generating config: /usr/hdp/current/hadoop-client/conf/core-site.xml
2016-12-17 08:35:27,721 - File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,738 - File['/usr/hdp/current/hadoop-client/conf/slaves'] {'content': Template('slaves.j2'), 'owner': 'hdfs'}
2016-12-17 08:35:27,739 - Directory['/var/lib/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'group': 'hadoop', 'mode': 0751}
2016-12-17 08:35:27,740 - Directory['/var/lib/ambari-agent/data/datanode'] {'create_parents': True, 'mode': 0755}
2016-12-17 08:35:27,744 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/boot', '/proc/sys/fs/binfmt_misc', '/var/lib/nfs/rpc_pipefs'].
2016-12-17 08:35:27,744 - Mount point for directory /hadoop/hdfs/data is /
2016-12-17 08:35:27,744 - Mount point for directory /hadoop/hdfs/data is /
2016-12-17 08:35:27,744 - Last mount for /hadoop/hdfs/data in the history file is /
2016-12-17 08:35:27,744 - Will manage /hadoop/hdfs/data since it's on the same mount point: /
2016-12-17 08:35:27,745 - Forcefully ensuring existence and permissions of the directory: /hadoop/hdfs/data
2016-12-17 08:35:27,745 - Directory['/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2016-12-17 08:35:27,749 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/boot', '/proc/sys/fs/binfmt_misc', '/var/lib/nfs/rpc_pipefs'].
2016-12-17 08:35:27,749 - Mount point for directory /hadoop/hdfs/data is /
2016-12-17 08:35:27,749 - File['/var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist'] {'content': '\n# This file keeps track of the last known mount-point for each dir.\n# It is safe to delete, since it will get regenerated the next time that the component of the service starts.\n# However, it is not advised to delete this file since Ambari may\n# re-create a dir that used to be mounted on a drive but is now mounted on the root.\n# Comments begin with a hash (#) symbol\n# dir,mount_point\n/hadoop/hdfs/data,/\n', 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2016-12-17 08:35:27,750 - Directory['/var/run/hadoop'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0755}
2016-12-17 08:35:27,751 - Directory['/var/run/hadoop/hdfs'] {'owner': 'hdfs', 'group': 'hadoop', 'create_parents': True}
2016-12-17 08:35:27,752 - Directory['/var/log/hadoop/hdfs'] {'owner': 'hdfs', 'group': 'hadoop', 'create_parents': True}
2016-12-17 08:35:27,752 - File['/var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'] {'action': ['delete'], 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'}
2016-12-17 08:35:27,760 - Skipping File['/var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'] due to not_if
2016-12-17 08:35:27,761 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'}
2016-12-17 08:35:27,767 - Skipping Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ; /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode''] due to not_if
2016-12-17 08:35:27,796 - Command: /usr/bin/hdp-select status hadoop-hdfs-datanode > /tmp/tmp6ogtMa
Output: hadoop-hdfs-datanode - 2.5.0.0-1245 . .
... View more
Labels:
12-17-2016
01:11 PM
3 Kudos
[Related Article On Ambari Server Tuning : https://community.hortonworks.com/articles/131670/ambari-server-performance-tuning-troubleshooting-c.html ] -The jcmd utility comes with the JDK and is present inside the "$JAVA_HOME/bin". It is used to send diagnostic command requests to the JVM, where these requests are useful for controlling Java Flight Recordings, troubleshoot, and diagnose JVM and Java Applications. Following are the conditions for using this utility.
- 1. It must be used on the same machine where the JVM is running - 2. Only a user who own the JVM process can connect to is using this utility.
This utility can help us in getting many details about the JVM process. Some of the most useful information's are as following:
Syntax:
jcmd $PID $ARGUMENT
Example1: Classes taking the most memory are listed at the top, and classes are listed in a descending order. /usr/jdk64/jdk1.8.0_60/bin/jcmd $PID GC.class_histogram > /tmp/22421_ClassHistogram.txt
Example2: Generate Heap Dump /usr/jdk64/jdk1.8.0_60/bin/jcmd $PID GC.heap_dump /tmp/test123.hprof
Example3: Explicitly request JVM to trigger a Garbage Collection Cycle. /usr/jdk64/jdk1.8.0_60/bin/jcmd $PID GC.run
Example4: Generate Thread dump. usr/jdk64/jdk1.8.0_60/bin/jcmd $PID Thread.print
Example5: List JVM properties. /usr/jdk64/jdk1.8.0_60/bin/jcmd $PID VM.system_properties
Example6: The Command line options along with the CLASSPATH setting. /usr/jdk64/jdk1.8.0_60/bin/jcmd $ PID VM.command_line
**NOTE:** To use few specific features offered by "jcmd" tool the "-XX:+UnlockDiagnosticVMOptions" JVM option need to be enabled. .
When to Collect Thread Dumps?
--------------------------------------------------------- Here now we will see a very common scenario when we find that the JVM process is talking a lots of time in processing the request. Many times we see that the JVM process is stuck/slow or completely Hung. In such scenario in order to investigate the root cause of the slowness we need to collect the thread dumps of the JVM process which will tell us about the various activities those threads are actually performing. Sometimes some threads are involved in some very high CPU intensive operations which also might cause a slowness in getting the response. So We should collect the thread dump as well as the CPU data using "top" command. Few things to consider while collecting the thread dumps: - 1. Collect the thread dump when we see the issue (slowness, stuck/ hung scenario ...etc).
. - 2. Mostly a single thread dump is not very useful. So whenever we collect the thread dump then we should collect at least 5-6 thread dumps.
In some interval like collect 5-6 thread dumps in 10 seconds interval. Like that we will get around 5-6 thread dumps in 1 minute. - 3. If we are also investigating that few threads might be consuming high CPU cycles then in order to find the APIs that are actually consuming the high CPU we must collect the Thread dump as well as the "top" command output data almost at the same time. . - In order to make this easy we can use a simple script "threaddump_cpu_with_cmd.sh" and use it for out troubleshooting & JVM data collection. The following script can be downloaded from: https://github.com/jaysensharma/MiddlewareMagicDemos/tree/master/HDP_Ambari/JVM #!/bin/sh
# Takes the JavaApp PID as an argument.
# Make sure you set JAVA_HOME
# Create thread dumps a specified number of times (i.e. LOOP) and INTERVAL.
# Thread dumps will be collected in the file "jcmd_threaddump.out", in the same directory from where this script is been executed.
# Usage:
# sudo - $user_Who_Owns_The_JavaProcess
# ./threaddump_cpu_with_cmd.sh <JAVA_APP_PID>
#
#
# Example:
# NameNode PID is "5752" and it is started by user "hdfs" then run this utility as following:
#
# su -l hdfs -c "/tmp/threaddump_cpu_with_cmd.sh 5752"
################################################################################################
# Number of times to collect data. Means total number of thread dumps.
LOOP=10
# Interval in seconds between data points.
INTERVAL=10
# Where to generate the threddump & top output files.
WHERE_TO_GENERATE_OUTPUT_FILES="/tmp"
# Setting the Java Home, by giving the path where your JDK is kept
# USERS MUST SET THE JAVA_HOME before running this scripta s following:
JAVA_HOME=/usr/jdk64/jdk1.8.0_60
echo "Writing CPU data log files to Directory: $WHERE_TO_GENERATE_OUTPUT_FILES"
for ((i=1; i <= $LOOP; i++))
do
#$JAVA_HOME/bin/jstack -l $1 >> jstack_threaddump.out
$JAVA_HOME/bin/jcmd $1 Thread.print >> $WHERE_TO_GENERATE_OUTPUT_FILES/jcmd_threaddump.out
_now=$(date)
echo "${_now}" >> $WHERE_TO_GENERATE_OUTPUT_FILES/top_highcpu.out
top -b -n 1 -H -p $1 >> $WHERE_TO_GENERATE_OUTPUT_FILES/top_highcpu.out
echo "Collected 'top' output and Thread Dump #" $i
if [ $i -lt $LOOP ]; then
echo "Sleeping for $INTERVAL seconds."
sleep $INTERVAL
fi
done
- Get the file "jcmd_threaddump.out" and "top_highcpu.out" for analysis. .
How to analyze the Thread dump Data?
--------------------------------------------------------- You may have a look at one of my old blog article which explains "High CPU Utilization Finding Cause?"
http://middlewaremagic.com/weblogic/?p=4884 . . Common Errors with "jcmd" utility.
---------------------------------------------------------
While running the JCMD we might see the below mentioned error. Here the "5752" is the NameNode PID. [root@c6401 keys]# /usr/jdk64/jdk1.8.0_60/bin/jcmd 5752 help
5752:
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded
at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106)
at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:63)
at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:208)
at sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:147)
at sun.tools.jcmd.JCmd.main(JCmd.java:131) This error occurred because, JCMD utility allows to connect only to the JVM process that we own the process. In this case we see that the "NameNode" process is being owned by the "hdfs" user where as in the above command we are trying to connect to the NameNode process Via "jcmd" utility as "root" user. The root user here does not own the process, Hence we see the error. - "hdfs" user owned process # su -l hdfs -c "/usr/jdk64/jdk1.8.0_60/bin/jcmd -l"
5752 org.apache.hadoop.hdfs.server.namenode.NameNode
5546 org.apache.hadoop.hdfs.tools.DFSZKFailoverController
5340 org.apache.hadoop.hdfs.server.datanode.DataNode
4991 org.apache.hadoop.hdfs.qjournal.server.JournalNode . - "root" user owned process [root@c6401 keys]# /usr/jdk64/jdk1.8.0_60/bin/jcmd -l
1893 com.hortonworks.support.tools.server.SupportToolServer
6470 com.hortonworks.smartsense.activity.ActivityAnalyzerFacade
16774 org.apache.ambari.server.controller.AmbariServer
29100 sun.tools.jcmd.JCmd -l
6687 org.apache.zeppelin.server.ZeppelinServer More information about this utility can be found at: https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr006.html . .
... View more
08-05-2016
11:24 AM
12 Kudos
As we know that the logs are important however many times we see that the logs consumes a lots of disk space. So if we want to use the log4j provided approach to control the logging behavior in a much efficient way, then in that case we can take advantage of the "apache-log4j-extra" package. More information about the extra packages can be found in : https://logging.apache.org/log4j/extras/ In this article we will see how to use the log compression (zip) feature of log4j to compress the NameNode logs on a daily basis automatically. In order to achieve the same we will need the following Steps: Step-1). As the extras features are not shipped with the default log4j implementation hence the users will need to download the "Apache Extras™ for Apache log4j" (Like: apache-log4j-extras) : https://logging.apache.org/log4j/extras/download.html Example: For example download the jar "apache-log4j-extras-1.2.17.jar" and place it inside the Hadoop library location. /usr/hdp/2.4.2.0-258/hadoop/lib/apache-log4j-extras-1.2.17.jar Step-2). Create a log4j appender like "ZIPRFA" using class "org.apache.log4j.rolling.RollingFileAppender" where we will define the "rollingPolicy". For more information about various Rolling Policies users can refer to : https://logging.apache.org/log4j/extras/apidocs/org/apache/log4j/rolling/ - Login to Ambari and then in the HDFS advanced configuration, Add the following Appender in the "Advanced hdfs-log4j" some where at the end. #### New Appender to Zip the Log Files Based on Daily Rotation
log4j.appender.ZIPRFA=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.ZIPRFA.File=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.ZIPRFA.layout=org.apache.log4j.PatternLayout
log4j.appender.ZIPRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.ZIPRFA.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.ZIPRFA.rollingPolicy.ActiveFileName=${hadoop.log.dir}/${hadoop.log.file}
log4j.appender.ZIPRFA.rollingPolicy.FileNamePattern=${hadoop.log.dir}/${hadoop.log.file}-.%d{yyyyMMdd}.log.gz Step-3). Also we will need to make sure that the NameNode should use the above mentioned Appender then we will need to add the "HADOOP_NAMENODE_OPTS" to include the "-Dhadoop.root.logger=INFO,ZIPRFA" something like following: export HADOOP_NAMENODE_OPTS="${SHARED_HADOOP_NAMENODE_OPTS} -XX:OnOutOfMemoryError=\"/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node\" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 ${HADOOP_NAMENODE_OPTS} -Dhadoop.root.logger=INFO,ZIPRFA" Step-4). Now Restart the NameNode and double check sure that the "-Dhadoop.root.logger=INFO,ZIPRFA" property is added properly somewhere at the end. We can confirm the same using the "ps -ef | grep NameNode" output hdfs 27497 1 3 07:07 ? 00:01:27 /usr/jdk64/jdk1.8.0_60/bin/java -Dproc_namenode -Xmx1024m -Dhdp.version=2.4.2.0-258 -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhdp.version= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.4.2.0-258/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,console -Djava.library.path=:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.2.0-258 -Dhadoop.log.dir=/var/log/hadoop/hdfs -Dhadoop.log.file=hadoop-hdfs-namenode-jss1.openstacklocal.log -Dhadoop.home.dir=/usr/hdp/2.4.2.0-258/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native:/usr/hdp/2.4.2.0-258/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.4.2.0-258/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:NewSize=200m -XX:MaxNewSize=200m -Xloggc:/var/log/hadoop/hdfs/gc.log-201608060706 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSInitiatingOccupancyOnly -Xms1024m -Xmx1024m -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT -XX:OnOutOfMemoryError="/usr/hdp/current/hadoop-hdfs-namenode/bin/kill-name-node" -Dorg.mortbay.jetty.Request.maxFormContentSize=-1 -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.root.logger=INFO,ZIPRFA -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode Step-5). Now as soon as the date changes we should be able to see that the old NameNode log file got zipped as following: [root@jayhost hdfs]# /var/log/hadoop/hdfs
[root@jayhost hdfs]# ls -lart *.gz
-rw-r--r--. 1 hdfs hadoop 32453 Aug 5 06:32 hadoop-hdfs-namenode-jayhost.openstacklocal.log-.20160804.log.gz
... View more
Labels:
- « Previous
- Next »