About fsm17

fsm17 · ‎08-19-2022

To follow up the number of containers...e.g. if I were running spark-sql, I'd run spark-sql --num-executors=32 and run the count query would run across the 32 executors. I cannot figure out a similar things for hive cli.

fsm17 · ‎08-19-2022

Thanks. The data in the table is 100s of GB, and each individual file (partition) is 200MB, so I think it should be dividing the tasks appropriately. It seems the issue is at startup; the hive cli is assigned a single container by yarn. And then the execution proceeds only in that container (regardless of the setting of hive.exec.submitviachild or hive.exec.submit.local.task.via.child).

fsm17 · ‎08-18-2022

I have created an external table in hive; the table is a set of partitioned parquet files (about 200 of them). After creating the table, I did an msck repair table and analyze table compute statistics. Then in the hive cli if I run something like "select count(*) from table" it runs a single mapper in a single container. I have hive.exec.parallel set to true; I am using tez as the execution engine. I have also tried setting hive.exec.parallel.thread.number=32 (though I don't think that should matter?). What else am I missing?

Online	Offline
Last Visited	‎10-07-2022 02:25 PM

Member Since	‎08-18-2022 03:10 PM
Last Visited	‎10-07-2022 02:25 PM
Posts	3

Cloudera Community

Re: Hive cli uses only a single container

Re: Hive cli uses only a single container

Hive cli uses only a single container