I have table in Hive (110GB Size) .
I want to fetch one particular record based on unique Id amoung 110GB data.
to see the comparision between Hive and Imapla .
My system Config :
Ram - 50GB, HDD - 1TB, CDH5.8, Quickstart VM.
So, I ran a select querry in Impala in hue , but it fetched upto 6.7GB data and throws an error
timed out (code THRIFTSOCKET):None
Same thing if i run in Command line means , its working.
So, Problem in Hue.
How to fix ?
Please guide .
What file format do you use ?
did you gather column statistics of the table ?
Impala is pretty fast when compared to hive as per my findings still depends on various factor .
also it is a good practice to you LIMIT when you perform any sampling .
You might want to check your current HiveServer2 parameters
hive.server2.session.check.interval hive.server2.idle.operation.timeout hive.server2.idle.session.timeout
Most probably it is because of the long runining query .
also provide us some HiveServer2 logs and Hue logs