Support Questions

Find answers, ask questions, and share your expertise

Hive cli uses only a single container

avatar
New Contributor

I have created an external table in hive; the table is a set of partitioned parquet files (about 200 of them). After creating the table, I did an msck repair table and analyze table compute statistics. 

 

Then in the hive cli if I run something like "select count(*) from table" it runs a single mapper in a single container. I have hive.exec.parallel set to true; I am using tez as the execution engine. I have also tried setting hive.exec.parallel.thread.number=32 (though I don't think that should matter?). What else am I missing?

5 REPLIES 5

avatar
Super Collaborator

Hi @fsm17 , As you are using tez as an execution engine, I would suggest to set the below in hive which controls the number of mappers.

 

  • tez.grouping.max-size(default 1073741824 which is 1GB) : The most data Tez will assign to a task. Decreasing this means more parallelism
  • tez.grouping.min-size(default 52428800 which is 50MB) : The least data Tez will assign to a task. Increasing this means less parallelism

avatar
New Contributor

Thanks. The data in the table is 100s of GB, and each individual file (partition) is 200MB, so I think it should be dividing the tasks appropriately.

 

It seems the issue is at startup; the hive cli is assigned a single container by yarn. And then the execution proceeds only in that container (regardless of the setting of hive.exec.submitviachild or 

hive.exec.submit.local.task.via.child). 

avatar
New Contributor

To follow up the number of containers...e.g. if I were running spark-sql, I'd run spark-sql --num-executors=32 and run the count query would run across the 32 executors. I cannot figure out a similar things for hive cli.

avatar
Super Collaborator

Something similar is discussed in this post but then again we discussed this already. maybe tez.grouping.split-count can help. Some more info here as well as in https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/group...

avatar
Community Manager

@fsm17, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. 



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community: