Created on 06-09-2020 07:16 PM - edited 06-09-2020 07:21 PM
I want to control the number of containers running in parallel for a query, so that I can many queries parallely in a yarn queue.
Yarn Queue Size: 200 GB
Approx. Mappers / Containers: 50
I setting container size at 10GB by setting hive.tez.container.size=10240;
Once the first query is triggered, the query consumes the whole queue (200 GB) and runs 20 containers parallely, and not allowing the other query to start due to unavailability of Yarn memory in the queue.
I want help in indentifying parameters to control the number of containers running in parallel, so thay I can limit to 10. So at any point a query will consume only max 100GB (10 Containers x 10 GB per container) of 200 GB yarn queue.
Created 06-10-2020 04:11 AM
@vigneshvenu You will need to send parameters to hive during execution of query.
You can find some good conversation about this here:
If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.
Thanks,
Steven @ DFHZ
Created 06-10-2020 04:11 AM
@vigneshvenu You will need to send parameters to hive during execution of query.
You can find some good conversation about this here:
If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post.
Thanks,
Steven @ DFHZ