Created 06-15-2016 07:47 PM
Hello team,
Encountered errors during data migration from CDH4.2 cluster to HDP 2.4 cluster using DISTCP and below are the details. Please let me know your thoughts.
HDP2.4 HA + Non-secure
Active NameNode hdpmasternode07
Standby NN hdpmasternode06 http://b.b.b.b:50070/dfshealth.html#tab-overview
core Hadoop version : 2.7
Passed Test cases: ------------------
Copy empty dirs from CDH to HDP 2.4 cluster Accessing the HDP 2.4 hdfs from CDH cluster using webhdfs protocol and vice-versa.
Command used -------------
hdfs@hdpmasternode07:/$ hadoop distcp hftp://a.a.a.a:50070/tmp/inv261/retail-batch.inv261b.log.gz hdfs://b.b.b.b:8020/tmp
Logs: ------ hdfs@hdpmasternode07:/$ hadoop distcp hftp://a.a.a.a:50070/tmp/inv261/retail-batch.inv261b.log.gz hdfs://b.b.b.b:8020/tmp 16/06/15 19:27:45 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://a.a.a.a:50070/tmp/inv261/retail-batch.inv261b.log.gz], targetPath=hdfs://b.b.b.b:8020/tmp, targetPathExists=true, preserveRawXattrs=false} 16/06/15 19:27:45 INFO impl.TimelineClientImpl: Timeline service address: http://hdpmasternode06:8188/ws/v1/timeline/ 16/06/15 19:27:46 INFO impl.TimelineClientImpl: Timeline service address: http://hdpmasternode06:8188/ws/v1/timeline/ 16/06/15 19:27:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2 16/06/15 19:27:46 INFO mapreduce.JobSubmitter: number of splits:1 16/06/15 19:27:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1466011429718_0013 16/06/15 19:27:47 INFO impl.YarnClientImpl: Submitted application application_1466011429718_0013 16/06/15 19:27:47 INFO mapreduce.Job: The url to track the job: http://hdpmasternode07:8088/proxy/application_1466011429718_0013/ 16/06/15 19:27:47 INFO tools.DistCp: DistCp job-id: job_1466011429718_0013 16/06/15 19:27:47 INFO mapreduce.Job: Running job: job_1466011429718_0013 16/06/15 19:27:53 INFO mapreduce.Job: Job job_1466011429718_0013 running in uber mode : false 16/06/15 19:27:53 INFO mapreduce.Job: map 0% reduce 0% 16/06/15 19:28:03 INFO mapreduce.Job: map 100% reduce 0% 16/06/15 19:31:04 INFO mapreduce.Job: Task Id : attempt_1466011429718_0013_m_000000_0, Status : FAILED Error: java.io.IOException: File copy failed: hftp://a.a.a.a:50070/tmp/inv261/retail-batch.inv261b.log.gz --> hdfs://b.b.b.b:8020/tmp/retail-batch.inv261b.log.gz at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:285) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:253) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hftp://a.a.a.a:50070/tmp/inv261/retail-batch.inv261b.log.gz to hdfs://b.b.b.b:8020/tmp/retail-batch.inv261b.log.gz at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:281) ... 10 more Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.net.SocketTimeoutException: connect timed out at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.getInputStream(RetriableFileCopyCommand.java:302) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:247) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToFile(RetriableFileCopyCommand.java:183) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:123) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99) at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87) ... 11 more Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.<init>(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:998) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:934) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:852) at sun.net.www.protocol.http.HttpURLConnection.followRedirect(HttpURLConnection.java:2412) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1559) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) at org.apache.hadoop.hdfs.web.HftpFileSystem$RangeHeaderUrlOpener.connect(HftpFileSystem.java:370) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:135) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:116) at org.apache.hadoop.hdfs.web.ByteRangeInputStream.<init>(ByteRangeInputStream.java:101) at org.apache.hadoop.hdfs.web.HftpFileSystem$RangeHeaderInputStream.<init>(HftpFileSystem.java:383) at org.apache.hadoop.hdfs.web.HftpFileSystem$RangeHeaderInputStream.<init>(HftpFileSystem.java:388) at org.apache.hadoop.hdfs.web.HftpFileSystem.open(HftpFileSystem.java:404) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:767) at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.getInputStream(RetriableFileCopyCommand.java:298) ... 16 more
Created 06-16-2016 04:03 PM
Thanks for your responses.Problem is with Network using publicIp's instead of private Ip..updates source cluster private ip's in destination cluster.
Created 06-15-2016 07:55 PM
Are you able to do hadoop fs -ls to both clusters from HDP2.4.x cluster? Try
hadoop fs -ls hdfs://source_cluster_Active_NN:8020/tmp
hadoop fs -ls hdfs://destination_cluster_Active_NN:8020/tmp
Also, check if host is reachable from destination cluster host.
And if above works, then try running Distcp with other user than hdfs. Try below:
hadoop distcp -strategy dynamic -prgbup \
-<overwrite/update> \
hdfs://source_cluster_Active_NN:8020/<test_file_path> \
hdfs://destination_cluster_Active_NN:8020/<Test_file_path>
Created 06-15-2016 09:58 PM
As discussed with @avoma, issue is resolved, it was due to incorrect entries in /etc/hosts file.
Created 06-15-2016 08:35 PM
Firewalls? Is there a network between them? Does your user have full permissions to use distcp?
Big data with slow network?
Caused by: java.net.SocketTimeoutException: connect timed out
Created 06-16-2016 04:03 PM
Thanks for your responses.Problem is with Network using publicIp's instead of private Ip..updates source cluster private ip's in destination cluster.
Created 07-13-2016 05:32 PM
will be helpful