I have a huge table table1 in Hive, which contains around 60 million rows (~500 GB ORC files in HDFS). It is partitioned by the column partCol.
Now I want to create a new table table2 in Hive, that has the same schema and shall contain only 50 million rows of table1.
Therefore I run this query:
INSERT OVERWRITE TABLE testdb.table2 partition(partCol) SELECT colA, colB, ..., partCol FROM testdb.table1 LIMIT 50000000;
This creates a lot of Tez Mapper tasks, which looks and works fine - The tasks take around 1 h to finish.
And now the problem: Afterwards there's only 1 Reducer Task, which runs for hours and then fails!
How to increase this number of Reducer tasks for this query? Is the LIMIT clause the issue?
I'm using the Hortonworks Data Platform 2.6.5 with Hive 1.2.1
The following Hive settings are configured: