Support Questions

Find answers, ask questions, and share your expertise

hadoop distcp command failing

avatar
Super Collaborator

distcp-error.txtthis command is failing and I am unable to find the reason looking at the log file ,please help identify the issue, log file attached.

hadoop distcp hdfs:///user/sami/ hdfs:///user/zhang

10 REPLIES 10

avatar

@Sami Ahmad

The following line seems to indicate the issue:

Caused by: java.io.IOException: Check-sum mismatch between hdfs://hadoop1.tolls.dot.state.fl.us:8020/user/sami/error1.log and hdfs://hadoop1.tolls.dot.state.fl.us:8020/user/zhang/.distcp.tmp.attempt_1472051594557_0001_m_000001_0. Source and target differ in block-size. Use -pb to preserve block-sizes during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.)

Is the block size set differently between the source and target clusters?

avatar
Super Collaborator

the source and target clusters? I am using same node ..hadoop1 . so I guess the block size would be same.

how can I check all this?

avatar
Super Guru

so you have two clusters on same node? Is it possible that two clusters have different block size settings? Can you please verify dfs.blocksize setting on both clusters?

avatar
Master Guru
@Sami Ahmad

Try below command

hadoop distcp -cp hdfs:///user/sami/ hdfs:///user/zhang

avatar
Super Collaborator

wrong syntax . "-cp doesn't exist"

avatar
Master Guru

@Sami Ahmad

Below is the correct syntax

hadoop distcp -pc hdfs:///user/sami/ hdfs:///user/zhang

avatar

@Sami Ahmad

If i'm not wrong , you are trying to copy the data within same cluster to different directories.

You can simply use the copy command.

hadoop fs -cp hdfs:///user/sami/ hdfs:///user/zhang

avatar
Super Collaborator

I want to use distcp for learning purposes.

avatar
Super Guru
@Sami Ahmad

Can you try distcp2 instead?

hadoop distcp2 hdfs:///user/sami/ hdfs:///user/zhang