I have some csv files that I imported from Vertica through Sqoop. I used a split-by column while importing the data that caused the data to be highly non uniform i.e. out of 300 part files, only 8 files have the data. Now I am creating ORC table from this data. I created an external table pointing to the data locationa and an ORC table. While doing an
"INSERT OVERWRITE table_orc SELECT * FROM table_csv;
It is taking forever because effectively only 8 mappers are running. Is there a way I can split the whole data into 120 equal parts and run the query?