Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

distcp fails with encrypted files

avatar

I am trying to do distcp with encrypted files as below (Please note that /user/test_user is an encrypted directory)

Scenario:

Run below commands:

  • kdestroy
  • kinit -kt ~/hadoopqa/keytabs/test_user.headless.keytab test_user@EXAMPLE.COM
  • hdfs dfs -copyFromLocal /etc/passwd /user/test_user
  • /usr/hdp/current/hadoop-client/bin/hadoop distcp /user/test_user/passwd /user/test_user/dest

I am getting this exception

17/02/15 00:08:12 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/user/test_user/passwd], targetPath=/user/test_user/dest, targetPathExists=true, filtersFile='null'}
17/02/15 00:08:12 INFO client.RMProxy: Connecting to ResourceManager at mynode.example.com/XX.XX.XX.XX:8050
17/02/15 00:08:12 INFO client.AHSProxy: Connecting to Application History server at mynode.example.com/XX.XX.XX.XX:10200
17/02/15 00:08:12 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 78 for test_user on XX.XX.XX.XX:8020
17/02/15 00:08:12 INFO security.TokenCache: Got dt for hdfs://mynode.example.com test_user)
17/02/15 00:08:12 INFO security.TokenCache: Got dt for hdfs://mynode.example.com masterKeyId=2)
17/02/15 00:08:13 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
17/02/15 00:08:13 INFO tools.SimpleCopyListing: Build file listing completed.
17/02/15 00:08:13 INFO tools.DistCp: Number of paths in the copy list: 1
17/02/15 00:08:13 INFO tools.DistCp: Number of paths in the copy list: 1
17/02/15 00:08:13 INFO client.RMProxy: Connecting to ResourceManager at mynode.example.com/XX.XX.XX.XX:8050
17/02/15 00:08:13 INFO client.AHSProxy: Connecting to Application History server at mynode.example.com/XX.XX.XX.XX:10200
17/02/15 00:08:14 INFO mapreduce.JobSubmitter: number of splits:1
17/02/15 00:08:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1487064603554_0002
17/02/15 00:08:14 INFO mapreduce.JobSubmitter: Kind: kms-dt, Service: XX.XX.XX.XX:9292, Ident: (owner=test_user, renewer=yarn, realUser=, issueDate=1487117292703, maxDate=1487722092703, sequenceNumber=2, masterKeyId=2)
17/02/15 00:08:14 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: XX.XX.XX.XX:8020, Ident: (HDFS_DELEGATION_TOKEN token 78 for test_user)
17/02/15 00:08:14 INFO impl.TimelineClientImpl: Timeline service address: https://mynode.example.com:8190/ws/v1/timeline/
17/02/15 00:08:15 INFO impl.YarnClientImpl: Submitted application application_1487064603554_0002
17/02/15 00:08:15 INFO mapreduce.Job: The url to track the job: https://mynode.example.com:8090/proxy/application_1487064603554_0002/
17/02/15 00:08:15 INFO tools.DistCp: DistCp job-id: job_1487064603554_0002
17/02/15 00:08:15 INFO mapreduce.Job: Running job: job_1487064603554_0002
17/02/15 00:08:24 INFO mapreduce.Job: Job job_1487064603554_0002 running in uber mode : false
17/02/15 00:08:24 INFO mapreduce.Job:  map 0% reduce 0%
17/02/15 00:08:35 INFO mapreduce.Job: Task Id : attempt_1487064603554_0002_m_000000_0, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mynode.example.com hdfs://mynode.example.com:8020/user/test_user/dest/passwd
	at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:287)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:255)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mynode.example.com hdfs://mynode.example.com:8020/user/test_user/dest/passwd
	at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
	at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:283)
	... 10 more
Caused by: java.io.IOException: Check-sum mismatch between hdfs://mynode.example.com hdfs://mynode.example.com:8020/user/test_user/dest/.distcp.tmp.attempt_1487064603554_0002_m_000000_0.
	at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:212)
	at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:130)
	at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
	at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
	... 11 more


Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/02/15 00:08:48 INFO mapreduce.Job:  map 100% reduce 0%
17/02/15 00:08:48 INFO mapreduce.Job: Task Id : attempt_1487064603554_0002_m_000000_1, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mynode.example.com hdfs://mynode.example.com:8020/user/test_user/dest/passwd
	at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:287)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:255)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mynode.example.com hdfs://mynode.example.com:8020/user/test_user/dest/passwd
	at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
	at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:283)
	... 10 more
Caused by: java.io.IOException: Check-sum mismatch between hdfs://mynode.example.com hdfs://mynode.example.com:8020/user/test_user/dest/.distcp.tmp.attempt_1487064603554_0002_m_000000_1.
	at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:212)
	at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:130)
	at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
	at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
	... 11 more


Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/02/15 00:08:49 INFO mapreduce.Job:  map 0% reduce 0%
17/02/15 00:08:58 INFO mapreduce.Job: Task Id : attempt_1487064603554_0002_m_000000_2, Status : FAILED
Error: java.io.IOException: File copy failed: hdfs://mynode.example.com hdfs://mynode.example.com:8020/user/test_user/dest/passwd
	at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:287)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:255)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://mynode.example.com hdfs://mynode.example.com:8020/user/test_user/dest/passwd
	at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
	at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:283)
	... 10 more
Caused by: java.io.IOException: Check-sum mismatch between hdfs://mynode.example.com hdfs://mynode.example.com:8020/user/test_user/dest/.distcp.tmp.attempt_1487064603554_0002_m_000000_2.
	at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:212)
	at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:130)
	at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
	at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
	... 11 more


Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143


17/02/15 00:09:08 INFO mapreduce.Job:  map 100% reduce 0%
17/02/15 00:09:12 INFO mapreduce.Job: Job job_1487064603554_0002 failed with state FAILED due to: Task failed task_1487064603554_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0


17/02/15 00:09:12 INFO mapreduce.Job: Counters: 8
	Job Counters
		Failed map tasks=4
		Launched map tasks=4
		Other local map tasks=4
		Total time spent by all maps in occupied slots (ms)=41166
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=41166
		Total vcore-milliseconds taken by all map tasks=41166
		Total megabyte-milliseconds taken by all map tasks=42153984
17/02/15 00:09:12 ERROR tools.DistCp: Exception encountered
java.io.IOException: DistCp failure: Job job_1487064603554_0002 has failed: Task failed task_1487064603554_0002_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0


	at org.apache.hadoop.tools.DistCp.waitForJobCompletion(DistCp.java:215)
	at org.apache.hadoop.tools.DistCp.execute(DistCp.java:158)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:128)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:462)
1 ACCEPTED SOLUTION

avatar

Running distcp against encrypted files will not work because of the checksum mismatch. The reason is as following:

Each file within an encryption zone has its own encryption key, called the Data Encryption Key (DEK). These DEKs are encrypted with their respective encryption zone's EZ key, to form an Encrypted Data Encryption Key (EDEK). EDEKs are stored persistently on the NameNode as part of each file's metadata, using HDFS extended attributes.

So, the raw file contents of the src/target file will be different and thus mismatching checksums.

This problem can however be solved by running distcp without the checksum check i.e. Try running -

hadoop distcp -skipcrccheck -update src dest

Let me know if this helps.

View solution in original post

2 REPLIES 2

avatar

Running distcp against encrypted files will not work because of the checksum mismatch. The reason is as following:

Each file within an encryption zone has its own encryption key, called the Data Encryption Key (DEK). These DEKs are encrypted with their respective encryption zone's EZ key, to form an Encrypted Data Encryption Key (EDEK). EDEKs are stored persistently on the NameNode as part of each file's metadata, using HDFS extended attributes.

So, the raw file contents of the src/target file will be different and thus mismatching checksums.

This problem can however be solved by running distcp without the checksum check i.e. Try running -

hadoop distcp -skipcrccheck -update src dest

Let me know if this helps.

avatar
Contributor

Hi @Namit Maheshwari

in case of distcp between source HA and destination HA clusters where secure zones are identical and having the same EZ key, case of files having the different EDEKs comes into picture or otherwise. I mean given the above case, does distcp of a file from src secure zone to dest secure zone gives the checksum mismatch error or not? if yes so, when distcp copies the file to dest secure zone, file gets decrypted on source first and then transferred over the wire and on destination encrypted with different EDEK. is that the statement correct?