03-27-2019 08:12 PM - last edited on 03-28-2019 07:59 AM by cjervis
We have multiple deliminated files from different source systems that we need to merge into one one impala table based on attributes in each file.
we are constantly running into memory errors while tyring to do the merge via HiveQL by creating dataframes of each file.
Each file cotain millions of rows.
Is mapreduce a viable solution for this situation
Are there any examples of how to handle these kind of situations.