I am facing lot of trouble in dealing with slowness/unresponsiveness of hive cli. Initially i thought it might be due to lack to resources in cluster but when i went and check running applications I went and see yarn running applications i found that there are lot of resources being left in cluster. I face the same issue when i run spark application which acquires around 50-60% of the cluster.
Please note that i have not set up queuing in yarn. All my applications goes to default queue.
I am not able to understand why opening hive cli gets stuck even after resource availability in cluster. Could anyone from the community help me in resolving this? Do i need to setup queuing. I am also attaching screenshot for the running applications in yarn when i try to open hive shell
I assume that you meant hive jobs when you have mentioned hive.cli
when the jobs are stuck then it doesn't meant that its because of the resource availability. There are many ways that it can be related to data which is being handled in the hive/spark jobs. Are you facing this issue only when you are running the same sets of query in hive and spark-sql?
If that is the case then it is definitely related to the data. When running hive jobs are you able to see few reducers running for very long time? in that case then few reducers are accumulated with huge data. Check the reason for that accumulation and distribute the data. Hope it helps!!
I am not even able to open shell even after 60% utilization of cluster as seen from yarn running applications. And I meant opening hive shell and running individual queries in those.
And it is not related to reducers. data and tuning of jobs are already done. Problems is unable to open shell after spark shell or spark jobs are running in yarn cluster mode.