Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

S3a MPU Times Out in a non AWS object store

Highlighted

S3a MPU Times Out in a non AWS object store

New Contributor

In a non-AWS object storage I am trying to upload a Large File (Size>250GB) using S3a client.
In core-site.xml, I have added following properties
fs.s3a.multipart.size=500M

fs.s3a.multipart.threshold=2147483647
which means if a file size greater than 2GB trigger a MPU.

In my object store code there is a lock on the object which prevents concurrent write on the object and seems like S3a is sending many multiple upload requests concurrently and thereby some of the requests gets TimeOut if they don't get hold of lock in 60 seconds. I want to tune S3a so that it does not send many multipart upload requests concurrently.

Looking at the Hortonworks document I see I may have to tune following properties.


fs.s3a.fast.upload=true

fs.s3a.threads.max =10 -> This would tune how many MPU requests S3a client would generate

fs.s3a.threads.keepalivetime=60 -> This would tune the life cycle of each thread

fs.s3a.max.total.tasks=5 -> This would tune how many tasks can be queued.

fs.s3a.multipart.size=500M

fs.s3a.multipart.threshold=2147483647

fs.s3a.multipart.purge=false

fs.s3a.multipart.purge.age=86400

fs.s3a.paging.maximum=1000

Please let me know if my understanding is correct or do I need to tune other properties too.

5 REPLIES 5

Re: S3a MPU Times Out in a non AWS object store

Rising Star

> In my object store code

if you have access to your code base and know where the issue is; won't it be easier to fix the issue you have? If you fix S3 via parameters it is just a matter of time before you run into other issues.

Re: S3a MPU Times Out in a non AWS object store

New Contributor

@aengineer I agree with you and we already did that by improvising our locking algorithm, but seems like S3a client is pushing too many concurrent requests which are trying to concurrently upload a part of a very big File (1 TB) using MPU. In the object store we have to put locks on object write so as to synchronize the write operation, but as the number of concurrent requests are too many, a few of them Time Out. Saying that we need to understand how the client is pushing requests for MPU part uploads and if there is any parameter on client side by which we can limit the number of concurrent MPU part upload requests, just to avoid TimeOuts. We have tested most of the operations with s3a (hive/MR/hadoop/spark etc) all works fine but its just one issue with MPU concurrent part upload which Time outs. Thanks

Re: S3a MPU Times Out in a non AWS object store

Rising Star

But you do realize that S3AFile system is just one application among thousands of applications that can run against the S3 head, right?

Re: S3a MPU Times Out in a non AWS object store

Rising Star

Perhaps, you need to return an error from the Server so that S3AFS slows down, and retries? would that be an option for you, that would be more generic way of solving your isssue.

Re: S3a MPU Times Out in a non AWS object store

New Contributor

You are right but what I am trying to say is if thousands of applications concurrently tries to modify the same object at the same time , we have to lock that object (which is being modified) just to make it synchronous. Do you agree?

This can lead to some of the request to starve and thats what we are seeing with S3a client. A hadoop copy operation has two parts.... put a temp file and then rename it. So the put operation is successful, we can see a temp file uploaded to our object store but some of this renaming requests TimesOut because we see multiple requests from S3A client trying to modify the same object and some of them are not able to acquire lock and times out.

Don't have an account?
Coming from Hortonworks? Activate your account here