Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

distcp doesn't preserve ACL and sometime permission on non-HDFS filesystem

distcp doesn't preserve ACL and sometime permission on non-HDFS filesystem

Explorer

Hello,

              I found a problem on distcp with non-HDFS filesystem, in practice it doesn't preserve ACL or extended attributes.

 

This could be a huge problem if you will backup HDFS on external system like s3, nfs or tape.

Moreover with cloud storage with very long retation like AWS Glacier

 

This is a typical output with a distcp to s3 with the new protocol s3a  :

 

 hadoop distcp  -Dfs.s3a.access.key=**************************** -Dfs.s3a.secret.key=*****************   -pa /******/*******     s3a://*********/***********
16/04/26 14:26:27 INFO Configuration.deprecation: fs.s3a.awsAccessKeyId is deprecated, use fs.s3a.access.key instead.
16/04/26 14:26:27 INFO Configuration.deprecation: fs.s3a.awsSecretAccessKey is deprecated, use fs.s3a.secret.key instead.
16/04/26 14:26:32 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/******/********], targetPath=s3a://*********/*********, targetPathExists=false, preserveRawXattrs=false, filtersFile='null'}
16/04/26 14:26:32 INFO client.RMProxy: Connecting to ResourceManager at ***************/**************
16/04/26 14:26:33 ERROR tools.DistCp: ACLs not supported on at least one file system:
org.apache.hadoop.tools.CopyListing$AclsNotSupportedException: ACLs not supported for file system: s3a://testmat

    at org.apache.hadoop.tools.util.DistCpUtils.checkFileSystemAclSupport(DistCpUtils.java:378)
    at org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:320)
    at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:220)
    at org.apache.hadoop.tools.DistCp.execute(DistCp.java:158)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)

 

 

It's the same problem also on an XFS or other filesystem that supports ACL / xattr .

 

Moreover for s3a/s3/s3n, I could not see a method to preserve not only ACL but also normal permission.

All objects in S3 change the owner to the bucket owner, even if it's used "-p" options

 

I think it's an important limitation of distcp, in the scenario where a long term backup is needed.

 

 

 

Kind Regards