Support Questions

Find answers, ask questions, and share your expertise

Upgrading HDP 3.1.0 to 3.1.4 : Cannot restart datanode

avatar
Contributor

I'm facing an issue during the upgrade of HDP 3.1.0.0-78 to 3.1.4.0-315 on Ubuntu 18

The upgrade process is not able to restart the datanodes.

I get the error java.lang.RuntimeException: Cannot start datanode because the configured max locked memory size (dfs.datanode.max.locked.memory) of 2147483648 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 16777216 bytes.

I don't understand why this error happens. The datanodes were well started before the starting of the upgrade process and the system setting RLIMIT_MEMLOCK hasn't been changed.

Thanks in advance for your help

 

 

1 ACCEPTED SOLUTION

avatar
Contributor

I was able to restart to the datanode from the Ambari UI after a restart of the ambari-agent on the servers where the datanode run

View solution in original post

6 REPLIES 6

avatar
Master Mentor

@Stephbat 

Please can you check those 2 values dfs.datanode.max.locked.memory and ulimit

 

The dfs.datanode.max.locked.memory determines the maximum amount of memory a DataNode will use for caching. The "locked-in-memory size" corresponds to ulimit (ulimit -l) of the DataNode user that needs to be increased to match this parameter.
The current dfs.datanode.max.locked.memory is 2 GB and while the RLIMIT_MEMLOCK is 16 MB

If you get the error “Cannot start datanode because the configured max locked memory size… is more than the datanode’s available RLIMIT_MEMLOCK ulimit,” that means that the operating system is imposing a lower limit on the amount of memory that you can lock than what you have configured. To fix this, you must adjust the ulimit -l value that the DataNode runs with.

 

Usually, this value is configured in /etc/security/limits.conf. However, it will vary depending on what operating system and distribution you are using please adjust the values accordingly remember that you will need space in memory for other things as well, such as the DataNode and application JVM heaps and the operating system page cache.

 

Once adjust the datanode should start as a charm 🙂 

Hope that helps 

avatar
Contributor

Thanks for this reply,

but I don't understand why the datanode started correctly before the upgrade process and failed during the upgrade process without any change on the OS limit RLIMIT_MEMLOCK

avatar
Master Mentor

@Stephbat 

Those are internals to Cloudera and that confirms myth migration/upgrades are never smooth we still need humans 🙂 Please do those changes and let me know if your datanodes fires up correctly.

avatar
Contributor

In fact, I can’t restart the datanode after the upgrade of Ambari from 2.7.3.0 to 2.7.4.0, not during the upgrade of HDP, and while the restart works fine before the upgrade

Below the logs of the restart with the error : The operating system limit max locked memory is set to 2197152 kbytes and it's more than the value of the parameter dfs.datanode.max.locked.memory (2147483648 bytes)

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 257446
max locked memory       (kbytes, -l) 2197152
max memory size         (kbytes, -m) unlimited
open files                      (-n) 128000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65536
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
==> /var/log/hadoop/hdfs/hadoop-hdfs-root-datanode-di-dbdne-fe-develophdpwkr-01.log <==
2020-07-22 06:42:20,156 INFO  datanode.DataNode (LogAdapter.java:info(51)) - registered UNIX signal handlers for [TERM, HUP, INT]
2020-07-22 06:42:20,422 INFO  security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(1009)) - Login successful for user dn/di-dbdne-fe-develophdpwkr-01.node.fe.sd.diod.tech@DIOD.TECH using keytab file /etc/security/keytabs/dn.service.keytab
2020-07-22 06:42:20,574 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(137)) - Scheduling a check for [DISK]file:/mnt/hdd0/hadoop/hdfs/data
2020-07-22 06:42:20,581 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(137)) - Scheduling a check for [DISK]file:/mnt/hdd1/hadoop/hdfs/data
2020-07-22 06:42:20,582 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(137)) - Scheduling a check for [DISK]file:/mnt/hdd2/hadoop/hdfs/data
2020-07-22 06:42:20,582 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(137)) - Scheduling a check for [DISK]file:/mnt/hdd3/hadoop/hdfs/data
2020-07-22 06:42:20,582 INFO  checker.ThrottledAsyncChecker (ThrottledAsyncChecker.java:schedule(137)) - Scheduling a check for [RAM_DISK]file:/mnt/dn-tmpfs
2020-07-22 06:42:20,656 INFO  impl.MetricsConfig (MetricsConfig.java:loadFirst(118)) - Loaded properties from hadoop-metrics2.properties
2020-07-22 06:42:20,911 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(85)) - Initializing Timeline metrics sink.
2020-07-22 06:42:20,912 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(105)) - Identified hostname = di-dbdne-fe-develophdpwkr-01.node.fe.sd.diod.tech, serviceName = datanode
2020-07-22 06:42:20,943 INFO  availability.MetricSinkWriteShardHostnameHashingStrategy (MetricSinkWriteShardHostnameHashingStrategy.java:findCollectorShard(42)) - Calculated collector shard di-dbdne-fe-develophdpadm-01.node.fe.sd.diod.tech based on hostname: di-dbdne-fe-develophdpwkr-01.node.fe.sd.diod.tech
2020-07-22 06:42:20,943 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(135)) - Collector Uri: http://di-dbdne-fe-develophdpadm-01.node.fe.sd.diod.tech:6188/ws/v1/timeline/metrics
2020-07-22 06:42:20,943 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(136)) - Container Metrics Uri: http://di-dbdne-fe-develophdpadm-01.node.fe.sd.diod.tech:6188/ws/v1/timeline/containermetrics
2020-07-22 06:42:20,948 INFO  impl.MetricsSinkAdapter (MetricsSinkAdapter.java:start(204)) - Sink timeline started
2020-07-22 06:42:20,988 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(374)) - Scheduled Metric snapshot period at 10 second(s).
2020-07-22 06:42:20,989 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:start(191)) - DataNode metrics system started
2020-07-22 06:42:21,068 INFO  common.Util (Util.java:isDiskStatsEnabled(395)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2020-07-22 06:42:21,070 INFO  datanode.BlockScanner (BlockScanner.java:<init>(184)) - Initialized block scanner with targetBytesPerSec 1048576
2020-07-22 06:42:21,073 INFO  datanode.DataNode (DataNode.java:<init>(486)) - File descriptor passing is enabled.
2020-07-22 06:42:21,074 INFO  datanode.DataNode (DataNode.java:<init>(499)) - Configured hostname is di-dbdne-fe-develophdpwkr-01.node.fe.sd.diod.tech
2020-07-22 06:42:21,074 INFO  common.Util (Util.java:isDiskStatsEnabled(395)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2020-07-22 06:42:21,076 ERROR datanode.DataNode (DataNode.java:secureMain(2883)) - Exception in secureMain
java.lang.RuntimeException: Cannot start datanode because the configured max locked memory size (dfs.datanode.max.locked.memory) of 2147483648 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 16777216 bytes.

 

avatar
Contributor

In fact, I can't restart the datanode from the Ambari UI, but I can restart it by executing the following command directly on the server where the datanode should run

 

/var/lib/ambari-agent/ambari-sudo.sh -H -E /usr/hdp/3.1.0.0-78/hadoop/bin/hdfs --config /usr/hdp/3.1.0.0-78/hadoop/conf --daemon start datanode

 

Therefore I think that the operating system limit max locked memory is right set on the server where the datanode should run

avatar
Contributor

I was able to restart to the datanode from the Ambari UI after a restart of the ambari-agent on the servers where the datanode run