Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Split Hive Table for easy processing in executors

Highlighted

Split Hive Table for easy processing in executors

Explorer

Hi,

 

We have a hive table containing billions of rows.

Everyday we recieve about 20million rows of new data

 

Every day we need to create a new updated table with this new records(i.e. with existing rows updated and new records inserted as required)  

 

 

What's the best way to go about this.

 

is there a way to chunk this table,  pass chunks of this Hive table to multiple executors and each executor will perform the process and write into new hive table

 


Env

Spark 1.6.0

Hive 1.1.0

java 1.8.0

scala 2.10.5

 

Don't have an account?
Coming from Hortonworks? Activate your account here