Size of single file is 2.4GB, number of records in table more than 22 millions. My cluster is on HDP 2.6 and consists of 16 nodes (each with 96GB memory).
I can't increase the number of mappers. I set followings:
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set tez.grouping.min-size=16777216; (16MB)
set tez.grouping.max-size=107374182; (107MB)
set hive.optimize.index.filter=true;
set use.hive.interactive.mode=true;
I'm executing following query "select PropertyType, count(*) as count from houses group by PropertyType;"
but each time Hive TEZ creates just ONE Mapper, thats why this query takes too long (135 sec., 95% of time spent for mapping task). I'm using Beeline interface, but even Hive CLI has same result.
----------------------------------------------------------------------------
INFO : Dag name: select PropertyType, count(*)...PropertyType(Stage-1)
INFO : Status: Running (Executing on YARN cluster with App id application_1521676736844_0019)
--------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED
--------------------------------------------------------------------------------
Map 1 .......... llap SUCCEEDED 1 1 0 0 0
Reducer 2 ...... llap SUCCEEDED 59 59 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 131.76 s
--------------------------------------------------------------------------------
INFO : Status: DAG finished successfully in 131.65 seconds
INFO :
INFO : Query Execution Summary
INFO : ----------------------------------------------------------------------------------------------
INFO : OPERATION DURATION
INFO : ----------------------------------------------------------------------------------------------
INFO : Compile Query 0.95s
INFO : Prepare Plan 0.38s
INFO : Submit Plan 0.36s
INFO : Start DAG 0.47s
INFO : Run DAG 131.65s
INFO : ----------------------------------------------------------------------------------------------
INFO : ...
INFO : OK
...
So, how can I increase the number of Mapper tasks?
I'd highly appreciate any help