Support Questions

Find answers, ask questions, and share your expertise

Cannot transfer files to HDP 2.5 sandbox using Talend - Corporate Network

avatar
Rising Star

Hi everyone,

I am trying to load files into a HDP 2.5 sandbox for VBox.

I am using Talend Open Studio 6.3.

My host system is a Windows 7 laptop connected to the corporate network.

I tested with NAT Network for the VM and created a forwarding rule for the port 50010 (Host IP 127.0.0.1, Host Port 50010 and Guest Port 50010)

I also added the 127.0.0.1 sandbox.hortonworks.com to the host file in Windows.

I am getting the following error while running the Talend job:

File xxx could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

Also, data node is running and I have enough space. According to the community the only thing left is the data node name could not be resolve, but I tried all of the suggestions without success.

What else could be wrong?

Any comment will be appreciated.

Kind regards,

Paul

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Paul Hernandez,

There are two levels of port forwarding between your local machine running Talend and the datanode running on port 50010. This is because there are two levels of virtualization so that the sandbox can be the same regardless of the virtualization that you choose. The two levels are Virtualbox and docker, and the forwarding rule that you added only covers getting the port forwarded through the virtualbox level. The following article by @Michael Young shows the steps to get the forwarding done through docker.

Note that you must do the 'docker commit sandbox sandbox' before you do the 'docker rm sandbox' or you will lose any work you have done in the sandbox and it will be reverted to initial state.

https://community.hortonworks.com/articles/65914/how-to-add-ports-to-the-hdp-25-virtualbox-sandbox.h...

John

View solution in original post

4 REPLIES 4

avatar
Expert Contributor

@Paul Hernandez,

There are two levels of port forwarding between your local machine running Talend and the datanode running on port 50010. This is because there are two levels of virtualization so that the sandbox can be the same regardless of the virtualization that you choose. The two levels are Virtualbox and docker, and the forwarding rule that you added only covers getting the port forwarded through the virtualbox level. The following article by @Michael Young shows the steps to get the forwarding done through docker.

Note that you must do the 'docker commit sandbox sandbox' before you do the 'docker rm sandbox' or you will lose any work you have done in the sandbox and it will be reverted to initial state.

https://community.hortonworks.com/articles/65914/how-to-add-ports-to-the-hdp-25-virtualbox-sandbox.h...

John

avatar
Rising Star

Thanks a lot! After following the @Michael Young article I was able to successfully run my Talend job.

avatar
Super Guru

@Paul Hernandez @jwhitmore

John is correct. The sandbox does not expose 50010 by default via Docker which is used internally for both Virtualbox and VMware based sandboxes. If you had a port forwarding rule to Virtualbox, the Docker instance running in that VM still has not exposed that port.

avatar
New Contributor

Hi everyone,

@jwhitmore

Thank's for your response, you are right, when exposing 50010, Talend For Big Data works (with component tHDFSConnect and cie)

But, even if we exposing the 50010 port, there are always the same error when using Talend ESB with Camel Framework, see below :

[WARN ]: org.apache.hadoop.hdfs.DFSClient - DataStreamer Exceptionorg.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/Ztest.csv.opened could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1641)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122)

I've design a Scala program, and i'm facing the same issue :

15:59:22.386 [main] ERROR org.apache.hadoop.hdfs.DFSClient - Failed to close inode 500495org.apache.hadoop.ipc.RemoteException: File /user/hdfs/testscala2.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1641)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122)

Any idea ?

Thank's in advance.

Best regards,

Mickaël.