06-23-2017 12:27 PM
I recently added a new DN to my CDH cluster.
The cluster has 42 DNs and is in HA with Kerberos.
I follow the procedure to do it in this order:
1 - Install kerberos and copy krb5.conf from one running node.
2 - In the kdc server add the principals and create the keytab files.
3 - Test kerberos using kinit OK
4 - Install hadoop-hdfs-datanode
5 - Duplicate conf files from one existing and running DN.
6 - Put the keytabs files.
7 - Add the DN fqdn to slaves file
8 - Start hadoop-hdfs-datanode service
9 - Run hdfs dfsadmin -refreshNodes on the active NameNode
10 - As step 4 didn't get the new node I stopped and started again the hadoop-hdfs-namenode service.
This are all steps I performed, but the DN is not present and hadoop NN started with this error:
Security is on.
Safe mode is ON. The reported blocks 17338785 has reached the threshold 0.9500 of total blocks 17451701. The number of live datanodes 42 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 14 seconds.
15,461,247 files and directories, 17,456,116 blocks = 32,917,363 total filesystem object(s).
Safe mode was turned off automatically but I think I'm missing something.
Any help will be appreciated.
06-23-2017 01:22 PM
You may need to add the new host details in all the active nodes under /etc/hosts. Also scp the same file to the new node too. so that all node will have the same hosts file
between, may I know why you are adding new host manully. Instead you can follow add host wizard from the below link...
06-26-2017 09:58 AM
The etc/hosts file is fine. The cluster uses DNS.
The weird thing is that YARN looks the added node (because I installed the nodeManager service too) but not hdfs.
Maybe I've to restart the entire cluster but I want to avoid that.
I'm not using CM because this cluster was deployed just with CDH.
The most strange thing is that hadoop-hdsf-datanode service is not logging in /var/log/hadoop-hdfs/hadoop-hdfs-datanode-mynode.log.
The status output is:
hadoop-hdfs-datanode.service - LSB: Hadoop datanode
Loaded: loaded (/etc/init.d/hadoop-hdfs-datanode)
Active: active (running) since Mon 2017-06-26 12:47:18 EDT; 25s ago
Process: 61510 ExecStop=/etc/init.d/hadoop-hdfs-datanode stop (code=exited, status=0/SUCCESS)
Process: 62022 ExecStart=/etc/init.d/hadoop-hdfs-datanode start (code=exited, status=0/SUCCESS)
├─24818 jsvc.exec -Dproc_datanode -outfile /usr/lib/hadoop/logs/jsvc.out -errfile /usr/lib/hadoop/logs/jsvc.err -pidfile /tmp/hadoop_secure_dn.pid -nodetach -user hdfs -cp /etc/hadoop/conf:/usr/lib/h...
└─62508 jsvc.exec -Dproc_datanode -outfile /var/log/hadoop-hdfs/jsvc.out -errfile /var/log/hadoop-hdfs/jsvc.err -pidfile /var/run/hadoop-hdfs/hadoop_secure_dn.pid -nodetach -user hdfs -cp /etc/hadoop...
The content of /usr/lib/hadoop/logs/jsvc.err is "Service killed by signal 11"
The content of /usr/lib/hadoop-hdfs/logs/jsvc.err is "Service killed by signal 11"
The out files are empty.
Jun 26 12:47:09 mynode hadoop-hdfs-datanode: starting datanode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-mynode.out
Jun 26 12:47:18 mynode hadoop-hdfs-datanode: Started Hadoop datanode (hadoop-hdfs-datanode):.
Jun 26 12:47:18 mynode systemd: Started LSB: Hadoop datanode.
max memory size (kbytes, -m) unlimited
open files (-n) 32768
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 65536
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I've never seen this before.
06-26-2017 11:45 AM
To my knowledge, any update on the above files require a restart. It is not only cluster restart, also you need to restart the node itself using "init 6" command. This is one of the important step (again this is my understanding, you need to double check and confirm before you try this).
so skipping one step may leads to strange thing