When HDFS does not have enough memory because it is running some other jobs or when the HDFS storage is almost full, When a new Hive queries is requested by a client, how this query is affected in terms of performance? Does adding a new host improve the performance?
Run Hive query would be running a YARN application and when the queue resource is utilized, then the performance of the query would be affected with respect to container allocation.
Wrt HDFS, Hive query would be writing intermediate results, join data spills on the user's temporary directory on HDFS. Less or no storage would be lead to query failures.