Support Questions

Find answers, ask questions, and share your expertise

I am using Both hive CLI and Beeline CLI. Beeline CLI much slower than Hive CLI ..what is reason?

avatar
New Contributor

Hi Myself Devendra Bhumarapu,

 

Recently We have upgraded to from Hive CLI to Beeline CLI with Hive Server2. While Querying on both  Beeline CLI much slower than the Hive CLI. Though Beeline CLI advanced version of Hive We need to look into speed.I dont know the what is the exact reason behind that.. Please share your thoughts.

 

 

 

Thanks,

Devendra B

1 ACCEPTED SOLUTION

avatar

Hello Devendra,

 

Beeline is just a thin JDBC client to access HiveServer2. HS2 does the same thing which Hive CLI should - to submit the Hive query as a MR job. So the YARN / MR job execution times should be the same.

There can be more factors what could influence the speed of the query through HiveServer2, but most of the time it comes to one of these:

- HS2 host is overloaded (see top output for example)

- HiveServer2 java heap is too small and JVM needs to GC too many time. Thus it cannot process the query as fast as it would be optimal (look up HS2 java heap settings, look at GC times after enabling GC logging)

- HiveServer2 needs to run more complex queries at the same time, but only one query can be compiled at a time which can be a bottleneck. You can exclude it if you do your performance tests in a quite time, or take a stack trace collection while this is slow.

 

Hope this helps a bit.

 

Miklos Szurap 

Customer Operations Engineer, Cloudera

View solution in original post

3 REPLIES 3

avatar

Hello Devendra,

 

Beeline is just a thin JDBC client to access HiveServer2. HS2 does the same thing which Hive CLI should - to submit the Hive query as a MR job. So the YARN / MR job execution times should be the same.

There can be more factors what could influence the speed of the query through HiveServer2, but most of the time it comes to one of these:

- HS2 host is overloaded (see top output for example)

- HiveServer2 java heap is too small and JVM needs to GC too many time. Thus it cannot process the query as fast as it would be optimal (look up HS2 java heap settings, look at GC times after enabling GC logging)

- HiveServer2 needs to run more complex queries at the same time, but only one query can be compiled at a time which can be a bottleneck. You can exclude it if you do your performance tests in a quite time, or take a stack trace collection while this is slow.

 

Hope this helps a bit.

 

Miklos Szurap 

Customer Operations Engineer, Cloudera

avatar
New Contributor

Thanks Miklos Szurap 

your information helped much

avatar
New Contributor
So what will be the solution, Is there any configuration changes we can make to speed up the beeline CLI?