Support Questions

Find answers, ask questions, and share your expertise

datanode is not starting after enable kerberos in docker container.

New Contributor

I try to create HDP cluster using docker container.

It is working as expect, when i enable kerberos. the datanode service is not starting.

The error message :

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 13.0px Menlo; color: #ffffff; background-color: #2b66c9} span.s1 {font-variant-ligatures: no-common-ligatures}

Java HotSpot(TM) 64-Bit Server VM (25.112-b15) for linux-amd64 JRE (1.8.0_112-b15), built on Sep 22 2016 21:10:53 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)

Memory: 4k page, physical 32891648k(24143024k free), swap 0k(0k free)

CommandLine flags: -XX:CMSInitiatingOccupancyFraction=70 -XX:ErrorFile=/var/log/hadoop/hdfs/hs_err_pid%p.log -XX:InitialHeapSize=1073741824 -XX:MaxHeapSize=1073741824 -XX:MaxNewSize=209715200 -XX:MaxTenuringThreshold=6 -XX:NewSize=209715200 -XX:OldPLABSize=16 -XX:ParallelGCThreads=4 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

2019-01-29T16:42:45.640+0000: 1.526: [GC (Allocation Failure) 2019-01-29T16:42:45.640+0000: 1.526: [ParNew: 163840K->13917K(184320K), 0.0152720 secs] 163840K->13917K(1028096K), 0.0153719 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

2019-01-29T16:42:46.695+0000: 2.581: [GC (Allocation Failure) 2019-01-29T16:42:46.695+0000: 2.581: [ParNew: 177757K->15542K(184320K), 0.0369095 secs] 177757K->19854K(1028096K), 0.0369834 secs] [Times: user=0.14 sys=0.02, real=0.04 secs]

2019-01-29T16:42:46.732+0000: 2.618: [GC (CMS Initial Mark) [1 CMS-initial-mark: 4311K(843776K)] 20660K(1028096K), 0.0025872 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]

2019-01-29T16:42:46.735+0000: 2.621: [CMS-concurrent-mark-start]

2019-01-29T16:42:46.752+0000: 2.638: [CMS-concurrent-mark: 0.017/0.017 secs] [Times: user=0.09 sys=0.00, real=0.02 secs]

2019-01-29T16:42:46.752+0000: 2.638: [CMS-concurrent-preclean-start]

2019-01-29T16:42:46.754+0000: 2.640: [CMS-concurrent-preclean: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

2019-01-29T16:42:46.754+0000: 2.640: [GC (CMS Final Remark) [YG occupancy: 17413 K (184320 K)]2019-01-29T16:42:46.754+0000: 2.640: [Rescan (parallel) , 0.0025756 secs]2019-01-29T16:42:46.756+0000: 2.643: [weak refs processing, 0.0000195 secs]2019-01-29T16:42:46.756+0000: 2.643: [class unloading, 0.0035364 secs]2019-01-29T16:42:46.760+0000: 2.646: [scrub symbol table, 0.0027905 secs]2019-01-29T16:42:46.763+0000: 2.649: [scrub string table, 0.0006302 secs][1 CMS-remark: 4311K(843776K)] 21725K(1028096K), 0.0101118 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]

2019-01-29T16:42:46.764+0000: 2.650: [CMS-concurrent-sweep-start]

2019-01-29T16:42:46.767+0000: 2.653: [CMS-concurrent-sweep: 0.003/0.003 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]

2019-01-29T16:42:46.767+0000: 2.653: [CMS-concurrent-reset-start]

2019-01-29T16:42:46.801+0000: 2.687: [CMS-concurrent-reset: 0.034/0.034 secs] [Times: user=0.10 sys=0.03, real=0.04 secs]

Heap

par new generation total 184320K, used 132385K [0x00000000c0000000, 0x00000000cc800000, 0x00000000cc800000)

eden space 163840K,71% used [0x00000000c0000000, 0x00000000c721adb0, 0x00000000ca000000)

from space 20480K,75% used [0x00000000ca000000, 0x00000000caf2d988, 0x00000000cb400000)

to space 20480K, 0% used [0x00000000cb400000, 0x00000000cb400000, 0x00000000cc800000)

concurrent mark-sweep generation total 843776K, used 4302K [0x00000000cc800000, 0x0000000100000000, 0x0000000100000000)

Metaspace used 32410K, capacity 32758K, committed 33044K, reserved 1079296K

class spaceused 3902K, capacity 4025K, committed 4116K, reserved 1048576K

==> /var/log/hadoop/hdfs/jsvc.err <==

set_caps: failed to set capabilities

check that your kernel supports capabilities

set_caps(CAPS) failed for user 'hdfs'

Service exit with a return value of 4

set_caps: failed to set capabilities

check that your kernel supports capabilities

set_caps(CAPS) failed for user 'hdfs'

Service exit with a return value of 4

set_caps: failed to set capabilities

check that your kernel supports capabilities

set_caps(CAPS) failed for user 'hdfs'

Service exit with a return value of 4

set_caps: failed to set capabilities

check that your kernel supports capabilities

set_caps(CAPS) failed for user 'hdfs'

Service exit with a return value of 4

set_caps: failed to set capabilities

check that your kernel supports capabilities

set_caps(CAPS) failed for user 'hdfs'

Service exit with a return value of 4

set_caps: failed to set capabilities

check that your kernel supports capabilities

set_caps(CAPS) failed for user 'hdfs'

Service exit with a return value of 4

set_caps: failed to set capabilities

check that your kernel supports capabilities

set_caps(CAPS) failed for user 'hdfs'

Service exit with a return value of 4

set_caps: failed to set capabilities

check that your kernel supports capabilities

set_caps(CAPS) failed for user 'hdfs'

Service exit with a return value of 4

set_caps: failed to set capabilities

check that your kernel supports capabilities

set_caps(CAPS) failed for user 'hdfs'

Service exit with a return value of 4

set_caps: failed to set capabilities

check that your kernel supports capabilities

set_caps(CAPS) failed for user 'hdfs'

Service exit with a return value of 4

/usr/lib/ambari-agent/lib/resource_management/core/environment.py:165: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  Logger.info("Skipping failure of {0} due to ignore_failures. Failure reason: {1}".format(resource, ex.message))
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 155, in <module>
    DataNode().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 375, in execute

    method(env)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 978, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py", line 62, in start
    datanode(action="start")
  File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_datanode.py", line 68, in datanode
    create_log_dir=True
  File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/utils.py", line 276, in service
    Execute(daemon_cmd, not_if=process_id_exists_command, environment=hadoop_env_exports)
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 303, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh  -H -E /usr/hdp/2.6.5.1050-37/hadoop/sbin/hadoop-daemon.sh --config /usr/hdp/2.6.5.1050-37/hadoop/conf start datanode' returned 1. starting datanode, logging to


1 REPLY 1

New Contributor

The solution to this issue is, the hostname of the cluster is not resolved in any one of the nodes.

After placing hostname in /etc/hosts file for all nodes. It is working fine.