We have a hive table containing billions of rows.
Everyday we recieve about 20million rows of new data
Every day we need to create a new updated table with this new records(i.e. with existing rows updated and new records inserted as required)
What's the best way to go about this.
is there a way to chunk this table, pass chunks of this Hive table to multiple executors and each executor will perform the process and write into new hive table