Support Questions

Find answers, ask questions, and share your expertise

URGENT : Datanode started but not not live (dead)

avatar
Contributor

I'm facing a critical issue on hdp cluster, datanodes are running but dead

enirys_0-1619168708427.png

Bellow datanode logs

 

 

2021-04-22 23:30:06,457 WARN  checker.StorageLocationChecker (StorageLocationChecker.java:check(209)) - Exception checking StorageLocation [DISK]file:/grid/disk11/hadoop/hdfs/data/
2021-04-22 23:30:06,553 INFO  impl.MetricsConfig (MetricsConfig.java:loadFirst(112)) - loaded properties from hadoop-metrics2.properties
2021-04-22 23:30:06,700 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(82)) - Initializing Timeline metrics sink.
2021-04-22 23:30:06,702 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(102)) - Identified hostname = ovh-cnode5.26f5de01-5e40-4d8a-98bd-a4353b7bf5e3.datalake.ovh, serviceName = datanode
2021-04-22 23:30:06,813 INFO  availability.MetricSinkWriteShardHostnameHashingStrategy (MetricSinkWriteShardHostnameHashingStrategy.java:findCollectorShard(42)) - Calculated collector shard ovh-enode6.26f5de01-5e40-4d8a-98bd-a4353b7bf5e3.datalake.ovh based on hostname: ovh-cnode5.26f5de01-5e40-4d8a-98bd-a4353b7bf5e3.datalake.ovh
2021-04-22 23:30:06,814 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(125)) - Collector Uri: http://ovh-enode6.26f5de01-5e40-4d8a-98bd-a4353b7bf5e3.datalake.ovh:6188/ws/v1/timeline/metrics
2021-04-22 23:30:06,814 INFO  timeline.HadoopTimelineMetricsSink (HadoopTimelineMetricsSink.java:init(126)) - Container Metrics Uri: http://ovh-enode6.26f5de01-5e40-4d8a-98bd-a4353b7bf5e3.datalake.ovh:6188/ws/v1/timeline/containermetrics
2021-04-22 23:30:06,827 INFO  impl.MetricsSinkAdapter (MetricsSinkAdapter.java:start(206)) - Sink timeline started
2021-04-22 23:30:06,896 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(376)) - Scheduled snapshot period at 10 second(s).
2021-04-22 23:30:06,897 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:start(192)) - DataNode metrics system started
2021-04-22 23:30:06,904 INFO  datanode.BlockScanner (BlockScanner.java:<init>(180)) - Initialized block scanner with targetBytesPerSec 1048576
2021-04-22 23:30:06,911 INFO  common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2021-04-22 23:30:06,917 INFO  datanode.DataNode (DataNode.java:<init>(444)) - File descriptor passing is enabled.
2021-04-22 23:30:06,918 INFO  datanode.DataNode (DataNode.java:<init>(455)) - Configured hostname is ovh-cnode5.26f5de01-5e40-4d8a-98bd-a4353b7bf5e3.datalake.ovh
2021-04-22 23:30:06,918 INFO  common.Util (Util.java:isDiskStatsEnabled(111)) - dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2021-04-22 23:30:06,918 WARN  conf.Configuration (Configuration.java:getTimeDurationHelper(1659)) - No unit for dfs.datanode.outliers.report.interval(1800000) assuming MILLISECONDS
2021-04-22 23:30:06,924 ERROR datanode.DataNode (DataNode.java:secureMain(2692)) - Exception in secureMain
java.lang.RuntimeException: Cannot start secure DataNode without configuring either privileged resources or SASL RPC data transfer protection and SSL for HTTP.  Using privileged resources in combination with SASL RPC data transfer protection is not supported.
        at org.apache.hadoop.hdfs.server.datanode.DataNode.checkSecureConfig(DataNode.java:1354)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1224)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:456)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2591)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2493)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2540)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2685)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2709)
2021-04-22 23:30:06,927 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2021-04-22 23:30:06,933 INFO  datanode.DataNode (LogAdapter.java:info(47)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at ovh-cnode5.26f5de01-5e40-4d8a-98bd-a4353b7bf5e3.datalake.ovh/10.1.2.69
************************************************************/

 

 

Any help is welcome

3 REPLIES 3

avatar
Contributor

Hi @enirys ,

 

From the below log snippet, I could think that the DN is running on the regular ports such as 50010 and 50075 (as per CM). Please confirm from your end.

 

2021-04-22 23:30:06,918 WARN  conf.Configuration (Configuration.java:getTimeDurationHelper(1659)) - No unit for dfs.datanode.outliers.report.interval(1800000) assuming MILLISECONDS
2021-04-22 23:30:06,924 ERROR datanode.DataNode (DataNode.java:secureMain(2692)) - Exception in secureMain
java.lang.RuntimeException: Cannot start secure DataNode without configuring either privileged resources or SASL RPC data transfer protection and SSL for HTTP.  Using privileged resources in combination with SASL RPC data transfer protection is not supported.

 

If they are running with the above ports are unprivileged ports on the OS. Could you try use the port 1024 (generally we do use 1004 and 1006)

 

Or the other option is to enable SASL RPC data transfer protection and TLS

 

The first option should be an easier one. Please try and let me know.

avatar
Expert Contributor

Does the solution here not work for you?

 

Thanks,

Megh

avatar
Contributor

Hi,

 

Port are configured correctly, i just restarted my vms and fixed the issue.

But it happens many times, and now i'm seeing other errors in logs

ERROR datanode.DataNode (DataXceiver.java:run(278)) - ovh-cnode19.26f5de01-5e40-4d8a-98bd-a4353b7bf5e3.datalake.ovh:1019:DataXceiver error processing WRITE_BLOCK operation  src: /10.1.2.106:34306 dst: /10.1.2.171:1019
java.io.IOException: Premature EOF from inputStream