Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Master Mentor

Many times we find that the operations (like start/stop/restart ...etc) are failing from ambari UI. In such cases if we want to troubleshoot what Ambari UI did to perform that operation or How the commands were executed. Then we can manually execute those same operations from the individual host with the help of "/var/lib/ambari-agent/data/command-xxx.json" file.

- When we perform any operation from ambari UI like (Starting / Stopping Datanode) then we will notice that ambari shows the operation progress in the UI. There we can see basically the following two files.

Example:

stderr:  /var/lib/ambari-agent/data/errors-952.txt
stdout:  /var/lib/ambari-agent/data/output-952.txt

10413-restarting-datanode-from-ambari-ui.png

- Apart from the above files there is one more important file which ambari agent uses to execute the instructions/commands that are sent by the AmbariServer. We can find that specific file in the ambari-agent's "/var/lib/ambari-agent/data/command-xxx.json" file.

/var/lib/ambari-agent/data/command-952.json

- Here the "command-xxx.json" file has the command ID (xxx) same as the "errors-xxx.txt" & "output-xxx.txt" (as command-952.json, errors-952.txt, output-952.txt)

- The "command-xxx.json" file contains lots of information's in it specially the "localComponents", "configuration_attributes", "configurationTags" and the command type. In this file we can find the data snippet something like following:

    }, 
    "public_hostname": "c6402.ambari.apache.org", 
    "commandId": "53-0", 
    "hostname": "c6402.ambari.apache.org", 
    "kerberosCommandParams": [], 
    "serviceName": "HDFS", 
    "role": "DATANODE", 
    "forceRefreshConfigTagsBeforeExecution": false, 
    "requestId": 53, 
    "agentConfigParams": {
        "agent": {
            "parallel_execution": 0
        }
    }, 
    "clusterName": "ClusterDemo", 
    "commandType": "EXECUTION_COMMAND", 
    "taskId": 952, 
    "roleParams": {
        "component_category": "SLAVE"
    }, 
    "conf

.

.

How to execute the same command from the host ("c6402.ambari.apache.org") where the operation was actually performed?

.

Step-1).

======= Login to the host in which the command was executed. Here it is "c6402.ambari.apache.org" which we can see in the ambari UI operations history. While stopping DataNode.

ssh root@c6402.ambari.apache.org

Step-2).

======= As in the operation we were performing DataNode Start operation hence we will be executing the "datanode.py" script. As following:

[root@c6402 ambari-agent]# PATH=$PATH:/var/lib/ambari-agent/

[root@c6402 ambari-agent]# python2.6 /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py START /var/lib/ambari-agent/data/command-952.json  /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package  /tmp/Jay/tmp.txt ERROR /tmp/Jay/

. ** NOTICE: ** Here we have modified the PATH variable temporarily, Because if we will not edit it then while running the above command we might see the following error:

We will need to set the PATH just to make sure that when we will try to execute the commands we have the "ambari-sudo.sh" present in the PATH. Else we might see the following kind of error while trying to execute the commands. This is because ambari agent executes these commands with the help of "" script. So that script must be available in the PATH.

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 174, in <module>
    DataNode().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 58, in start
    import params
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/params.py", line 25, in <module>
    from params_linux import *
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/params_linux.py", line 20, in <module>
    import status_params
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/status_params.py", line 53, in <module>
    hadoop_conf_dir = conf_select.get_hadoop_conf_dir()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/conf_select.py", line 477, in get_hadoop_conf_dir
    select(stack_name, "hadoop", version)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/conf_select.py", line 315, in select
    shell.checked_call(_get_cmd("set-conf-dir", package, version), logoutput=False, quiet=False, sudo=True)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-python-wrap /usr/bin/conf-select set-conf-dir --package hadoop --stack-version 2.5.0.0-1245 --conf-version 0' returned 127. /bin/bash: ambari-sudo.sh: command not found

.

- Here the "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py" script will have the following arguments:

Script expects at least 6 arguments
Usage: datanode.py <COMMAND> <JSON_CONFIG> <BASEDIR> <STROUTPUT> <LOGGING_LEVEL> <TMP_DIR>

<COMMAND> command type (INSTALL/CONFIGURE/START/STOP/SERVICE_CHECK...)
<JSON_CONFIG> path to command json file. Ex: /var/lib/ambari-agent/data/command-2.json
<BASEDIR> path to service metadata dir. Ex: /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package
<STROUTPUT> path to file with structured command output (file will be created). Ex:/tmp/my.txt
<LOGGING_LEVEL> log level for stdout. Ex:DEBUG,INFO
<TMP_DIR> temporary directory for executable scripts. Ex: /var/lib/ambari-agent/tmp

.

- Once we have executed the above commands then we can see that the DataNode is started exactly the same way how we start it from Ambari UI. It also helps us in troubleshooting if ambari-server & agent were not communicating well and to isolate the issue.

Example: (OUTPUT)

=================

[root@c6402 Jay]#  python2.6 /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py START /var/lib/ambari-agent/data/command-952.json  /var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package  /tmp/Jay/tmp.txt DEBUG /tmp/Jay/
2016-12-17 08:35:27,489 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.5.0.0-1245
2016-12-17 08:35:27,489 - Checking if need to create versioned conf dir /etc/hadoop/2.5.0.0-1245/0
2016-12-17 08:35:27,489 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-12-17 08:35:27,506 - call returned (1, '/etc/hadoop/2.5.0.0-1245/0 exist already', '')
2016-12-17 08:35:27,507 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-12-17 08:35:27,523 - checked_call returned (0, '')
2016-12-17 08:35:27,523 - Ensuring that hadoop has the correct symlink structure
2016-12-17 08:35:27,524 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-12-17 08:35:27,529 - Stack Feature Version Info: stack_version=2.5, version=2.5.0.0-1245, current_cluster_version=2.5.0.0-1245 -> 2.5.0.0-1245
2016-12-17 08:35:27,530 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.5.0.0-1245
2016-12-17 08:35:27,531 - Checking if need to create versioned conf dir /etc/hadoop/2.5.0.0-1245/0
2016-12-17 08:35:27,531 - call[('ambari-python-wrap', '/usr/bin/conf-select', 'create-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1}
2016-12-17 08:35:27,548 - call returned (1, '/etc/hadoop/2.5.0.0-1245/0 exist already', '')
2016-12-17 08:35:27,549 - checked_call[('ambari-python-wrap', '/usr/bin/conf-select', 'set-conf-dir', '--package', 'hadoop', '--stack-version', '2.5.0.0-1245', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False}
2016-12-17 08:35:27,568 - checked_call returned (0, '')
2016-12-17 08:35:27,568 - Ensuring that hadoop has the correct symlink structure
2016-12-17 08:35:27,568 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2016-12-17 08:35:27,574 - checked_call['rpm -q --queryformat '%{version}-%{release}' hdp-select | sed -e 's/\.el[0-9]//g''] {'stderr': -1}
2.5.0.0-12452016-12-17 08:35:27,588 - checked_call returned (0, '2.5.0.0-1245', '')
2016-12-17 08:35:27,591 - Directory['/etc/security/limits.d'] {'owner': 'root', 'create_parents': True, 'group': 'root'}
2016-12-17 08:35:27,597 - File['/etc/security/limits.d/hdfs.conf'] {'content': Template('hdfs.conf.j2'), 'owner': 'root', 'group': 'root', 'mode': 0644}
2016-12-17 08:35:27,599 - XmlConfig['hadoop-policy.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-12-17 08:35:27,606 - Generating config: /usr/hdp/current/hadoop-client/conf/hadoop-policy.xml
2016-12-17 08:35:27,607 - File['/usr/hdp/current/hadoop-client/conf/hadoop-policy.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,615 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-12-17 08:35:27,622 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-client.xml
2016-12-17 08:35:27,622 - File['/usr/hdp/current/hadoop-client/conf/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,631 - Directory['/usr/hdp/current/hadoop-client/conf/secure'] {'owner': 'root', 'create_parents': True, 'group': 'hadoop', 'cd_access': 'a'}
2016-12-17 08:35:27,632 - XmlConfig['ssl-client.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf/secure', 'configuration_attributes': {}, 'configurations': ...}
2016-12-17 08:35:27,639 - Generating config: /usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml
2016-12-17 08:35:27,639 - File['/usr/hdp/current/hadoop-client/conf/secure/ssl-client.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,644 - XmlConfig['ssl-server.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {}, 'configurations': ...}
2016-12-17 08:35:27,651 - Generating config: /usr/hdp/current/hadoop-client/conf/ssl-server.xml
2016-12-17 08:35:27,652 - File['/usr/hdp/current/hadoop-client/conf/ssl-server.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,658 - XmlConfig['hdfs-site.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'configuration_attributes': {'final': {'dfs.datanode.failed.volumes.tolerated': 'true', 'dfs.datanode.data.dir': 'true', 'dfs.namenode.name.dir': 'true', 'dfs.support.append': 'true', 'dfs.webhdfs.enabled': 'true'}}, 'configurations': ...}
2016-12-17 08:35:27,665 - Generating config: /usr/hdp/current/hadoop-client/conf/hdfs-site.xml
2016-12-17 08:35:27,666 - File['/usr/hdp/current/hadoop-client/conf/hdfs-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,715 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/usr/hdp/current/hadoop-client/conf', 'mode': 0644, 'configuration_attributes': {'final': {'fs.defaultFS': 'true'}}, 'owner': 'hdfs', 'configurations': ...}
2016-12-17 08:35:27,721 - Generating config: /usr/hdp/current/hadoop-client/conf/core-site.xml
2016-12-17 08:35:27,721 - File['/usr/hdp/current/hadoop-client/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2016-12-17 08:35:27,738 - File['/usr/hdp/current/hadoop-client/conf/slaves'] {'content': Template('slaves.j2'), 'owner': 'hdfs'}
2016-12-17 08:35:27,739 - Directory['/var/lib/hadoop-hdfs'] {'owner': 'hdfs', 'create_parents': True, 'group': 'hadoop', 'mode': 0751}
2016-12-17 08:35:27,740 - Directory['/var/lib/ambari-agent/data/datanode'] {'create_parents': True, 'mode': 0755}
2016-12-17 08:35:27,744 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/boot', '/proc/sys/fs/binfmt_misc', '/var/lib/nfs/rpc_pipefs'].
2016-12-17 08:35:27,744 - Mount point for directory /hadoop/hdfs/data is /
2016-12-17 08:35:27,744 - Mount point for directory /hadoop/hdfs/data is /
2016-12-17 08:35:27,744 - Last mount for /hadoop/hdfs/data in the history file is /
2016-12-17 08:35:27,744 - Will manage /hadoop/hdfs/data since it's on the same mount point: /
2016-12-17 08:35:27,745 - Forcefully ensuring existence and permissions of the directory: /hadoop/hdfs/data
2016-12-17 08:35:27,745 - Directory['/hadoop/hdfs/data'] {'group': 'hadoop', 'cd_access': 'a', 'create_parents': True, 'ignore_failures': True, 'mode': 0755, 'owner': 'hdfs'}
2016-12-17 08:35:27,749 - Host contains mounts: ['/', '/proc', '/sys', '/dev/pts', '/dev/shm', '/boot', '/proc/sys/fs/binfmt_misc', '/var/lib/nfs/rpc_pipefs'].
2016-12-17 08:35:27,749 - Mount point for directory /hadoop/hdfs/data is /
2016-12-17 08:35:27,749 - File['/var/lib/ambari-agent/data/datanode/dfs_data_dir_mount.hist'] {'content': '\n# This file keeps track of the last known mount-point for each dir.\n# It is safe to delete, since it will get regenerated the next time that the component of the service starts.\n# However, it is not advised to delete this file since Ambari may\n# re-create a dir that used to be mounted on a drive but is now mounted on the root.\n# Comments begin with a hash (#) symbol\n# dir,mount_point\n/hadoop/hdfs/data,/\n', 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2016-12-17 08:35:27,750 - Directory['/var/run/hadoop'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0755}
2016-12-17 08:35:27,751 - Directory['/var/run/hadoop/hdfs'] {'owner': 'hdfs', 'group': 'hadoop', 'create_parents': True}
2016-12-17 08:35:27,752 - Directory['/var/log/hadoop/hdfs'] {'owner': 'hdfs', 'group': 'hadoop', 'create_parents': True}
2016-12-17 08:35:27,752 - File['/var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'] {'action': ['delete'], 'not_if': 'ambari-sudo.sh  -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid && ambari-sudo.sh  -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'}
2016-12-17 08:35:27,760 - Skipping File['/var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'] due to not_if
2016-12-17 08:35:27,761 - Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode''] {'environment': {'HADOOP_LIBEXEC_DIR': '/usr/hdp/current/hadoop-client/libexec'}, 'not_if': 'ambari-sudo.sh  -H -E test -f /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid && ambari-sudo.sh  -H -E pgrep -F /var/run/hadoop/hdfs/hadoop-hdfs-datanode.pid'}
2016-12-17 08:35:27,767 - Skipping Execute['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'ulimit -c unlimited ;  /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh --config /usr/hdp/current/hadoop-client/conf start datanode''] due to not_if
2016-12-17 08:35:27,796 - Command: /usr/bin/hdp-select status hadoop-hdfs-datanode > /tmp/tmp6ogtMa
Output: hadoop-hdfs-datanode - 2.5.0.0-1245

.

.

7,210 Views