Created 12-01-2015 09:25 PM
Hi All,
I am getting confused in using these protocols, I am trying to migrate data between cluster using distcp command and got stuck up here.
1. Which one will be faster?
2. Can we use one protocol at source and other at destination (I mean combination of both)
3. When can we webhdfs in particular
4. Will there be any speed difference in transfer between in using these protocols.
5. What will be the port numbers needed in using these (somewhere I saw commands with 50070 and 80020, when to use what)
If there is any document or URL on this topic, please share
Thanks
Kishore
Created on 12-01-2015 11:01 PM - edited 12-01-2015 11:03 PM
> 1. Which one will be faster?
The native protocol of HDFS is hdfs:// and this is the fastest type (purely TCP, with efficient data packet transfers). Other protocols such as webhdfs:// or the deprecated hftp:// add overheads due to their HTTP usage that make them slower overall.
> 2. Can we use one protocol at source and other at destination (I mean combination of both)
> 3. When can we webhdfs in particular
Yes to (2).
See http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_admin_distcp_da... for (3).
Rule of thumb is:
- Use webhdfs:// for source when its a different major version (such as a CDH4 source to CDH5 target).
- Use hdfs:// otherwise, when the major version is the same (such as between any CDH 5.x).
- Prefer webhdfs:// over hftp://, unless its a very old version (pre CDH3u5) that has no WebHDFS support.
> 4. Will there be any speed difference in transfer between in using these protocols.
Yes. This is also a repeat of (1), which I've answered above.
> 5. What will be the port numbers needed in using these (somewhere I saw commands with 50070 and 80020, when to use what)
Follow the CDH5 ports guide at http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_ig_ports_cdh5.h... to find the right ports for your environment. Defaults are used in the below statement.
HDFS native protocol transfers require every host on the DistCp job cluster (usually target), to be able to talk to the source's 8020 (for NameNode(s)) and 50010/1004, 50020 (across all DataNodes) ports.
WebHDFS or HFTP, HTTP based protocol transfers require every host on the DistCp job cluster (usually target), to be able to talk to the source's 50070 (for NameNode(s)) and 50075/1006 (across all DataNodes) ports.
Created 12-02-2015 02:03 AM
Hi Harsha,
Thanks for the quick reply and they are pretty clear to understand.
I am transferring data from insecure cluster to secure cluster,from the link you provided I can see that we need to use either hdfs or webhdfs. Insecure cluster is of 5.3.x and secure cluster is of 5.4.x, so I am using webhdfs at source and webhdfs at destination. Is this the best way to do ?
You are suggesting to use webhdfs to hdfs, but in first point you said hdfs uses tcp and it will be faster than https .. Right ?
Please correct me if going wrong anywhere..
Thanks in advance.
Thanks
Kishore
Created 12-02-2015 04:09 AM
Hi Harsh,
I forgot to specify my problem in previous post -
I am running distcp command with following command
time hadoop distcp -p -strategy dynamic -m 40 webhdfs://<Source Name node IP>:50070/<path to file> hdfs://<Name service of destination cluster>/<path to file>
But getting error --> ERROR [main] org.apache.hadoop.tools.util.RetriableCommand: Failure in Retriable command: Copying webhdfs://<Source Name node IP>:50070/<path to file> hdfs://<Name service of destination cluster>/<path to file>
I am executing this command on destination cluster
I have following doubts to get cleared before reexecuting the command -
1. I am not able to use name service at source location instead of <Source Name node IP>:50070 and execute the command
2. Do we need to mention port 8022 when using hdfs at destination when we use name service ?
Thanks
Kishore
Created 12-02-2015 10:36 AM
Created 12-02-2015 11:33 AM
Thank you for the detailed explanination.
Created 12-02-2015 11:36 AM
Hi Harsha,
Can you please look into this aswell .
Thanks
Kishore