Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this solution

avatar

Hello Devendra,

 

Beeline is just a thin JDBC client to access HiveServer2. HS2 does the same thing which Hive CLI should - to submit the Hive query as a MR job. So the YARN / MR job execution times should be the same.

There can be more factors what could influence the speed of the query through HiveServer2, but most of the time it comes to one of these:

- HS2 host is overloaded (see top output for example)

- HiveServer2 java heap is too small and JVM needs to GC too many time. Thus it cannot process the query as fast as it would be optimal (look up HS2 java heap settings, look at GC times after enabling GC logging)

- HiveServer2 needs to run more complex queries at the same time, but only one query can be compiled at a time which can be a bottleneck. You can exclude it if you do your performance tests in a quite time, or take a stack trace collection while this is slow.

 

Hope this helps a bit.

 

Miklos Szurap 

Customer Operations Engineer, Cloudera

View solution in original post

Who agreed with this solution