In a non-AWS object storage I am trying to upload a Large File (Size>250GB) using S3a client.
In core-site.xml, I have added following properties
which means if a file size greater than 2GB trigger a MPU.
In my object store code there is a lock on the object which prevents concurrent write on the object and seems like S3a is sending many multiple upload requests concurrently and thereby some of the requests gets TimeOut if they don't get hold of lock in 60 seconds. I want to tune S3a so that it does not send many multipart upload requests concurrently.
Looking at the Hortonworks document I see I may have to tune following properties.
fs.s3a.threads.max =10 -> This would tune how many MPU requests S3a client would generate
fs.s3a.threads.keepalivetime=60 -> This would tune the life cycle of each thread
fs.s3a.max.total.tasks=5 -> This would tune how many tasks can be queued.
Please let me know if my understanding is correct or do I need to tune other properties too.
> In my object store code
if you have access to your code base and know where the issue is; won't it be easier to fix the issue you have? If you fix S3 via parameters it is just a matter of time before you run into other issues.
@aengineer I agree with you and we already did that by improvising our locking algorithm, but seems like S3a client is pushing too many concurrent requests which are trying to concurrently upload a part of a very big File (1 TB) using MPU. In the object store we have to put locks on object write so as to synchronize the write operation, but as the number of concurrent requests are too many, a few of them Time Out. Saying that we need to understand how the client is pushing requests for MPU part uploads and if there is any parameter on client side by which we can limit the number of concurrent MPU part upload requests, just to avoid TimeOuts. We have tested most of the operations with s3a (hive/MR/hadoop/spark etc) all works fine but its just one issue with MPU concurrent part upload which Time outs. Thanks
Perhaps, you need to return an error from the Server so that S3AFS slows down, and retries? would that be an option for you, that would be more generic way of solving your isssue.
You are right but what I am trying to say is if thousands of applications concurrently tries to modify the same object at the same time , we have to lock that object (which is being modified) just to make it synchronous. Do you agree?
This can lead to some of the request to starve and thats what we are seeing with S3a client. A hadoop copy operation has two parts.... put a temp file and then rename it. So the put operation is successful, we can see a temp file uploaded to our object store but some of this renaming requests TimesOut because we see multiple requests from S3A client trying to modify the same object and some of them are not able to acquire lock and times out.