01-17-2017 08:27 AM - edited 01-17-2017 08:27 AM
Hive is not Oracle. You should not expect the same processing capabilities.
Hive is designed to run long and heavy queries. Whereas it performed poorly for small queries like the one you try to optimize.
Also note that Hive run on top of Yarn and per design Yarn takes time to instanciate containers and the JVM inside these containers. This should correspond to your question "why it takes so much time to start the Query Job".
If you want to get a quick reply for a basic count(*) without any filter/condition you might want to read about Hive statistics.
01-17-2017 09:20 AM
I think my query performs well.
From the Hadoop JobHistory Web Interface i see the following results for my query:
Submit Time : 2017.01.17 12:01:32 EST
Start Time : Tue Jan 17 12:01:35 EST 2017
Finish Time : 2017.01.17 12:01:36 EST
However the Hive Client renders data in seconds... what could be the bottleneck then ?
01-17-2017 11:50 AM