Created 07-13-2016 10:01 PM
- Setup 3 node hadoop cluster on Amazon using Ambari. Each instance is r3.xlarge (30GB RAM)
- I adjusted the YARN cluster params per this link
The query is
set hive.execution.engine=tez; (I tried "mr" as well )
select zip, count(cid) as ginti from (select split(ln,',')[0] as cid, split(ln,',')[1] as zip from utils.file1 where fn='foo1.csv') dd group by zip order by ginti desc
The CSV is a 2 column data with 18 million rows
RESULT
The query just seems to hang and does not return with results !
Created 07-13-2016 10:06 PM
Can you share the hiveserver2 logs ?
Created 07-15-2016 05:54 AM
Can you provide more details on what you mean by query hang? Does the YARN application associated with the session gets into the RUNNING state? Check the Resource Manager UI to check on resource usage. Also you might want to run the subquery separately
select split(ln,',')[0] as cid, split(ln,',')[1] as zip from utils.file1 where fn='foo1.csv' limit 10;
and see if it returns fine.
If this doesn't yield any clues then might be useful to share the HiveServer2 logs, explain plan.
Created 07-19-2016 12:56 AM
@Deepesh
Following query returns values in both TEZ and MR mode
select split(ln,',')[0]as cid, split(ln,',')[1]as zip from utils.file1 where fn='foo1.csv' limit 1000;