Created on 07-19-2016 11:37 PM - edited 09-16-2022 03:30 AM
Hi Myself Devendra Bhumarapu,
Recently We have upgraded to from Hive CLI to Beeline CLI with Hive Server2. While Querying on both Beeline CLI much slower than the Hive CLI. Though Beeline CLI advanced version of Hive We need to look into speed.I dont know the what is the exact reason behind that.. Please share your thoughts.
Thanks,
Devendra B
Created 08-08-2016 08:16 AM
Hello Devendra,
Beeline is just a thin JDBC client to access HiveServer2. HS2 does the same thing which Hive CLI should - to submit the Hive query as a MR job. So the YARN / MR job execution times should be the same.
There can be more factors what could influence the speed of the query through HiveServer2, but most of the time it comes to one of these:
- HS2 host is overloaded (see top output for example)
- HiveServer2 java heap is too small and JVM needs to GC too many time. Thus it cannot process the query as fast as it would be optimal (look up HS2 java heap settings, look at GC times after enabling GC logging)
- HiveServer2 needs to run more complex queries at the same time, but only one query can be compiled at a time which can be a bottleneck. You can exclude it if you do your performance tests in a quite time, or take a stack trace collection while this is slow.
Hope this helps a bit.
Miklos Szurap
Customer Operations Engineer, Cloudera
Created 08-08-2016 08:16 AM
Hello Devendra,
Beeline is just a thin JDBC client to access HiveServer2. HS2 does the same thing which Hive CLI should - to submit the Hive query as a MR job. So the YARN / MR job execution times should be the same.
There can be more factors what could influence the speed of the query through HiveServer2, but most of the time it comes to one of these:
- HS2 host is overloaded (see top output for example)
- HiveServer2 java heap is too small and JVM needs to GC too many time. Thus it cannot process the query as fast as it would be optimal (look up HS2 java heap settings, look at GC times after enabling GC logging)
- HiveServer2 needs to run more complex queries at the same time, but only one query can be compiled at a time which can be a bottleneck. You can exclude it if you do your performance tests in a quite time, or take a stack trace collection while this is slow.
Hope this helps a bit.
Miklos Szurap
Customer Operations Engineer, Cloudera
Created 08-08-2016 11:56 PM
Thanks Miklos Szurap
your information helped much
Created 02-05-2018 04:35 AM