Support Questions

zoro07500 · ‎03-03-2017

i have 4 csv files , i want to join and merge these files into one files based on a column timestamps to get one file.using spark or hadoop Please any help would be appreciated

balaram38489 · ‎03-03-2017

Sol 1: Reduce side join

create separate mappers for all 4 csv files and produce the time stamp as key from all mappers and remaining fields + tag field to represent from which file it is returned as value.

handle them in reduce side..

Sol 2: Map side join (if 3 files are small)

add 3 csv files into distributed cache and merge them with large csv file in mapper.

adnanalvee · ‎03-03-2017

hi @Maher Hattabi

I am seeing a similar question of yours in the link below.

Here is one where i answered the question combining any files whether it be csv or txt

https://community.hortonworks.com/questions/85230/erge-csv-files-in-one-file.html#answer-85245

Cloudera Community

Support Questions

merge csv files based on a column timestamp to get one file

Converting a Large JSON File into CSV

Transfer files to S3 based on file timestamp

Update CSV attribute/Merge CSV files

Converting CSV Files to Apache Hive Tables with Ap...

Use column values of a csv file to route flow file...

Create custom format from the csv file content usi...

Best way to merge multi part file into single file...

Reading CSV File Spark - Issue with Backslash

NiFi: Merge files based on attribute and send emai...

How to Merge files together by file attribute in N...