I've been trying to configure a cluster using the Hortonworks sandbox. My wish is to run ambari server with a namenode on Virtualbox sandbox and configure N different nodes serving as data nodes, hosted on separate machines in the same LAN.
I can successfully register a host, I can see it as healthy, however, in the hdfs admin report I cannot see it being alive. When I check the logs, there's an error trying to resolve hostname.
My datanode is ATM run on Ubuntu 16, I also tried to run a Vagrant VM with centos 6 - same results. This is the output of HDFS log on datanode machine:
2018-02-10 10:22:58,696 INFO datanode.DataNode (BPServiceActor.java:register(713)) - Block pool BP-1281279544-172.17.0.2-1501250400082 (Datanode Uuid 0f73d59c-c63a-427a-b050-8ad1a5bbb774) service to sandbox.hortonworks.com/18.104.22.168:8020 beginning handshake with NN 2018-02-10 10:22:58,705 ERROR datanode.DataNode (BPServiceActor.java:run(773)) - Initialization failed for Block pool BP-1281279544-172.17.0.2-1501250400082 (Datanode Uuid 0f73d59c-c63a-427a-b050-8ad1a5bbb774) service to sandbox.hortonworks.com/22.214.171.124:8020 Datanode denied communication with namenode because hostname cannot be resolved (ip=10.0.2.2, hostname=10.0.2.2): DatanodeRegistration(0.0.0.0:50010, datanodeUuid=0f73d59c-c63a-427a-b050-8ad1a5bbb774, infoPort=50075, infoSecurePort=0, ipcPort=8010, storageInfo=lv=-56;cid=CID-c609618c-c8f4-46b4-b016-ff5b1cb3ced7;nsid=623753388;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:938) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4823) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1424) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:100)
Where does this 10.0.2.2 come from? I have not virtualization layer on my datanode, it's installed directly on the host OS. My hostnames are not resolvable by DNS, I am using only /etc/hosts entries. All hosts can ping each other using the hostnames provided and all the installations were successful. All the VM ports are unblocked in Virtualbox (0.0.0.0 mask).
Thank you very much in advance, any tips are appreciated.
Thanks, however, my intention was to run it at least 2 datanodes, on different CPU to see how performance would scale, test some failover scenarios etc. I could make it all work with docker. I am still wondering why the scenario above for VBox didn't work. So far, I have found many useful posts by @Michael Young. Michael, maybe you'd have some idea?
I'm not positive, but it might have something to do with how the communications get translated when they go through the Sandbox. The more recent versions of the Sandbox use Docker. So the HDP cluster is in a Docker container within the Virtualbox instance. My guess is that the Namenode is seeing the IP address of the datanode as it maybe gets converted through the VirtualBox and then Docker layers.
The HDP and HDF sandboxes for docker on the hortonworks page will only have the startup script and the images are now on docker hub: