Support Questions
Find answers, ask questions, and share your expertise

hash issue while using S3 as Storage Backend for HDFS

Contributor

Hi All,

I am facing the below issue while creating an dir on HDFS with Minio S3 as storage backend -

com.amazonaws.AmazonClientException: Unable to verify integrity of data upload.  Client calculated content hash (contentMD5: 1B2M2Y8AsgTpgAmY7PhCfg== in base 64) didn't match hash (etag: null in hex) calculated by Amazon S3.  You may need to delete the data stored in Amazon S3. (metadata.contentMD5: null, md5DigestStream: com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream@2d6861a6, bucketName: hadoopsa, key: dindi2/): Unable to verify integrity of data upload.  Client calculated content hash (contentMD5: 1B2M2Y8AsgTpgAmY7PhCfg== in base 64) didn't match hash (etag: null in hex) calculated by Amazon S3.  You may need to delete the data stored in Amazon S3.

Below error while adding file to HDFS

root@hemant-insa:~# hdfs dfs -put abc.txt s3a://sample/
put: saving output on abc.txt._COPYING_: com.amazonaws.AmazonClientException: Unable to verify integrity of data upload.  Client calculated content hash (contentMD5: 1B2M2Y8AsgTpgAmY7PhCfg== in base 64) didn't match hash (etag: null in hex) calculated by Amazon S3.  You may need to delete the data stored in Amazon S3. (metadata.contentMD5: 1B2M2Y8AsgTpgAmY7PhCfg==, md5DigestStream: null, bucketName: sample, key: abc.txt._COPYING_): Unable to verify integrity of data upload.  Client calculated content hash (contentMD5: 1B2M2Y8AsgTpgAmY7PhCfg== in base 64) didn't match hash (etag: null in hex) calculated by Amazon S3.  You may need to delete the data stored in Amazon S3. (metadata.contentMD5: 1B2M2Y8AsgTpgAmY7PhCfg==, md5DigestStream: null, bucketName: sample, key: abc.txt._COPYING_)

I have configured the properties as attached.

Any hints/solutions - @Predrag Minovic @Jagatheesh Ramakrishnan @Pardeep @Neeraj Sabharwal @Anshul Sisodia

91632-untitled.png

2 REPLIES 2

Contributor

Complete list of properties configured in core-site -

fs.s3a.access.key=
fs.s3a.secret.key=
fs.s3a.endpoint=
fs.s3a.path.style.access=false
fs.s3a.attempts.maximum=20
fs.s3a.connection.establish.timeout=5000
fs.s3a.connection.timeout=200000
fs.s3a.paging.maximum=5000
fs.s3a.threads.max=10
fs.s3a.socket.send.buffer=8192
fs.s3a.socket.recv.buffer=8192
fs.s3a.threads.keepalivetime=60
fs.s3a.max.total.tasks=5
fs.s3a.multipart.size=100M
fs.s3a.multipart.threshold=2147483647
fs.s3a.multiobjectdelete.enable=true
fs.s3a.buffer.dir=/tmp/s3a
fs.s3a.block.size=64M
fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
fs.AbstractFileSystem.s3a.impl=org.apache.hadoop.fs.s3a.S3A
fs.s3a.readahead.range=64K
fs.s3a.etag.checksum=true

Cloudera Employee

You can try below changes in your submit command as they may be causing the hash value calculated to be different :

Submit command :

I believe you want to write abc.txt in s3a bucket hadoopsa under sample folder. As you have already set hadoopsa as your defaultFS.

So you should use below command

hdfs dfs -put abc.txt /sample/ #sample folder should be existing before command run.
OR
hdfs dfs -put abc.txt s3a://hadoopsa/sample/

In your command when you put a file directly in s3a://sample/ it assumes sample as a bucket and tries to write in the base path.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.