Reply
Contributor
Posts: 43
Registered: ‎07-26-2016

Which parameter need to increase for long run of Hue ?

Hi frdz,

 

I have table in Hive (110GB Size) .

I want to fetch one particular  record based on unique Id amoung 110GB data.

to see the comparision between  Hive and Imapla .

 

My system Config :

Ram - 50GB, HDD - 1TB, CDH5.8, Quickstart VM.

 

So, I ran a select querry in Impala in hue , but it fetched upto 6.7GB data and throws an error  

timed out (code THRIFTSOCKET):None

 

Same thing if i run in Command line means , its working.

 

So, Problem in Hue.

How to fix ?

Please guide .

 

Thanks,

Syam.

Champion
Posts: 562
Registered: ‎05-16-2016

Re: Which parameter need to increase for long run of Hue ?

[ Edited ]

What file format do you use ? 

did you gather column statistics of the table ? 

 

Impala is pretty fast when compared to hive as per my findings still depends on various factor . 

 

also it is a good practice to you LIMIT when you perform any sampling . 

 

You might want to check your current HiveServer2 parameters

hive.server2.session.check.interval 
hive.server2.idle.operation.timeout 
hive.server2.idle.session.timeout 

Most probably it is because of the long runining query . 

also provide us some HiveServer2 logs and Hue logs 

Highlighted
Posts: 642
Topics: 3
Kudos: 103
Solutions: 67
Registered: ‎08-16-2016

Re: Which parameter need to increase for long run of Hue ?

Impala and hive have idle and session timeouts. These can be set globally at the service level or per client, so HUE can have its own.

The Quickstart VM is not the place or method to test performance or compare performance.

With that said, the statement below is all that is needed, if this is the usage pattern, then you should not use Hive. Impala will always be better for single record or column aggregation.

"I want to fetch one particular record based on unique Id amoung 110GB data."
Announcements