Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Split Hive Table for easy processing in executors


Split Hive Table for easy processing in executors




We have a hive table containing billions of rows.

Everyday we recieve about 20million rows of new data


Every day we need to create a new updated table with this new records(i.e. with existing rows updated and new records inserted as required)  



What's the best way to go about this.


is there a way to chunk this table,  pass chunks of this Hive table to multiple executors and each executor will perform the process and write into new hive table



Spark 1.6.0

Hive 1.1.0

java 1.8.0

scala 2.10.5


Don't have an account?
Coming from Hortonworks? Activate your account here