Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Implementing Join with Hadoop Map-Reduce

Highlighted

Implementing Join with Hadoop Map-Reduce

Explorer

Hi,

 

We have multiple deliminated files from different source systems that we need to merge into one one impala table based on attributes in each file.

 

we are constantly running into memory errors while tyring to do the merge via HiveQL by creating dataframes of each file.

 

Each file cotain millions of rows.

 

Is mapreduce a viable solution for this situation

 

Are there any examples of how to handle these kind of situations.

 

Thanks