Support Questions

Find answers, ask questions, and share your expertise

Which parameter need to increase for long run of Hue ?


Hi frdz,


I have table in Hive (110GB Size) .

I want to fetch one particular  record based on unique Id amoung 110GB data.

to see the comparision between  Hive and Imapla .


My system Config :

Ram - 50GB, HDD - 1TB, CDH5.8, Quickstart VM.


So, I ran a select querry in Impala in hue , but it fetched upto 6.7GB data and throws an error  

timed out (code THRIFTSOCKET):None


Same thing if i run in Command line means , its working.


So, Problem in Hue.

How to fix ?

Please guide .






What file format do you use ? 

did you gather column statistics of the table ? 


Impala is pretty fast when compared to hive as per my findings still depends on various factor . 


also it is a good practice to you LIMIT when you perform any sampling . 


You might want to check your current HiveServer2 parameters


Most probably it is because of the long runining query . 

also provide us some HiveServer2 logs and Hue logs 

Impala and hive have idle and session timeouts. These can be set globally at the service level or per client, so HUE can have its own.

The Quickstart VM is not the place or method to test performance or compare performance.

With that said, the statement below is all that is needed, if this is the usage pattern, then you should not use Hive. Impala will always be better for single record or column aggregation.

"I want to fetch one particular record based on unique Id amoung 110GB data."
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.