Member since
08-18-2022
3
Posts
0
Kudos Received
0
Solutions
08-19-2022
10:08 AM
To follow up the number of containers...e.g. if I were running spark-sql, I'd run spark-sql --num-executors=32 and run the count query would run across the 32 executors. I cannot figure out a similar things for hive cli.
... View more
08-19-2022
07:21 AM
Thanks. The data in the table is 100s of GB, and each individual file (partition) is 200MB, so I think it should be dividing the tasks appropriately. It seems the issue is at startup; the hive cli is assigned a single container by yarn. And then the execution proceeds only in that container (regardless of the setting of hive.exec.submitviachild or hive.exec.submit.local.task.via.child).
... View more
08-18-2022
03:17 PM
I have created an external table in hive; the table is a set of partitioned parquet files (about 200 of them). After creating the table, I did an msck repair table and analyze table compute statistics. Then in the hive cli if I run something like "select count(*) from table" it runs a single mapper in a single container. I have hive.exec.parallel set to true; I am using tez as the execution engine. I have also tried setting hive.exec.parallel.thread.number=32 (though I don't think that should matter?). What else am I missing?
... View more
Labels:
- Labels:
-
Apache Hive