Created 09-14-2016 03:16 PM
I run small clusters for development. Since 2.4.3 was released, I've noticed that I cannot bring up a cluster with 1GB of memory for the Namenode heap. The namenode runs out of memory, is terminated and the history server fails to start with the following error.
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/historyserver.py", line 190, in <module> HistoryServer().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/historyserver.py", line 101, in start host_sys_prepped=params.host_sys_prepped) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/copy_tarball.py", line 257, in copy_to_hdfs replace_existing_files=replace_existing_files, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 459, in action_create_on_execute self.action_delayed("create") File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 456, in action_delayed self.get_hdfs_resource_executor().action_delayed(action_name, self) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 255, in action_delayed self._create_resource() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 269, in _create_resource self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 322, in _create_file self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 179, in run_command _, out, err = get_user_call_output(cmd, user=self.run_user, logoutput=self.logoutput, quiet=False) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --data-binary @/usr/hdp/2.4.3.0-227/hadoop/mapreduce.tar.gz 'http://ec2-52-36-201-54.us-west-2.compute.amazonaws.com:50070/webhdfs/v1/hdp/apps/2.4.3.0-227/mapreduce/mapreduce.tar.gz?op=CREATE&user.name=hdfs&overwrite=True&permission=444' 1>/tmp/tmpEBh1rW 2>/tmp/tmpT1HZP0' returned 52. curl: (52) Empty reply from server 100
Increasing the NN Heap to 2GB works, but that takes 25% of my available memory on my dev vm.
Is there any guidance for minimum NN Heap size with 2.4.3? The closest I could find was: 2.3.6.
Thanks!
-D...
Created 09-14-2016 03:22 PM
Regarding your query: Is there any guidance for minimum NN Heap size with 2.4.3
NameNode heap size depends on many factors such as the number of files, the number of blocks, and the load on the system. So if your clusters file/block data size increases then accordingly the NN heap also need to be tuned.
Created 09-14-2016 03:25 PM
The reference you posted suggests 1GB should be enough for fewer than 1 million files. That is no longer accurate as the History Server will not start with a 1GB heap namenode.
Created 09-14-2016 04:18 PM
When you started the NameNode how much main memory was available free ? If that is less for the History Server then it wont start. Because JVM processes consumes memory not only in HeapSpace but some memory for PermGen (MetaSpace in JDK8) and Native Memory is also needed.
free -m
Created 09-14-2016 04:19 PM
It had over 10GB free. The webhdfs copy causes the namenode to exhaust its 1GB heap. This is a new behavior with HDP 2.4.3. 2.3.x, 2.4.0, 2.4.2 and 2.5.0 all work in this configuration with the same machines.
You may notice from the stack trace, this is prior to the history server trying to actually start.