Reply
Highlighted
Explorer
Posts: 21
Registered: ‎11-10-2015

distcp doesn't preserve ACL and sometime permission on non-HDFS filesystem

Hello,

              I found a problem on distcp with non-HDFS filesystem, in practice it doesn't preserve ACL or extended attributes.

 

This could be a huge problem if you will backup HDFS on external system like s3, nfs or tape.

Moreover with cloud storage with very long retation like AWS Glacier

 

This is a typical output with a distcp to s3 with the new protocol s3a  :

 

 hadoop distcp  -Dfs.s3a.access.key=**************************** -Dfs.s3a.secret.key=*****************   -pa /******/*******     s3a://*********/***********
16/04/26 14:26:27 INFO Configuration.deprecation: fs.s3a.awsAccessKeyId is deprecated, use fs.s3a.access.key instead.
16/04/26 14:26:27 INFO Configuration.deprecation: fs.s3a.awsSecretAccessKey is deprecated, use fs.s3a.secret.key instead.
16/04/26 14:26:32 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[/******/********], targetPath=s3a://*********/*********, targetPathExists=false, preserveRawXattrs=false, filtersFile='null'}
16/04/26 14:26:32 INFO client.RMProxy: Connecting to ResourceManager at ***************/**************
16/04/26 14:26:33 ERROR tools.DistCp: ACLs not supported on at least one file system:
org.apache.hadoop.tools.CopyListing$AclsNotSupportedException: ACLs not supported for file system: s3a://testmat

    at org.apache.hadoop.tools.util.DistCpUtils.checkFileSystemAclSupport(DistCpUtils.java:378)
    at org.apache.hadoop.tools.DistCp.configureOutputFormat(DistCp.java:320)
    at org.apache.hadoop.tools.DistCp.createJob(DistCp.java:220)
    at org.apache.hadoop.tools.DistCp.execute(DistCp.java:158)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:122)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:429)

 

 

It's the same problem also on an XFS or other filesystem that supports ACL / xattr .

 

Moreover for s3a/s3/s3n, I could not see a method to preserve not only ACL but also normal permission.

All objects in S3 change the owner to the bucket owner, even if it's used "-p" options

 

I think it's an important limitation of distcp, in the scenario where a long term backup is needed.

 

 

 

Kind Regards

Announcements