Created 09-13-2018 04:23 PM
Hi,
We have 2 files coming from different locations.
1) First one is coming containing Header (column names only)
2) Second one is having Data, in the same column sequence.
Aim:
We need to merge both in one output file, where Header comes on top (in first row) and data from 2 row on wards.
Looking forward.
Cheers
Created on 09-13-2018 05:27 PM - edited 08-17-2019 11:07 PM
You can acheive this case by using MergeContent processor
Merge Content Configs:
By configuring Minimum Number of Entries to 2 processor will wait until it got 2 entries.
Flow that i tried:
But if we got 2 flowfiles from Location1 it self merge content is going to merge those flowfiles into 1.
This flow only works when we are going to have one flowfile from each source then it works fine, if you haven't got any flowfile from location2 then processor just wait infinite time until it gets another flowfile.
To avoid this case use reasonable Max bin age time for your use case then processor will forcefully keeps the flow file into merged relationship.
Please refer to this link for configuring MergeContent processor.
(or)
If your header is always same:
1.With new record oriented processor capabilities you can ignore the header that is coming from Location1 and configure the ConvertRecord processor to add the header to the incoming data.
2.Using Replace text processor we can add the header from to the Location2 file.
Refer to this link for more details regards to this method.
Created on 09-13-2018 05:27 PM - edited 08-17-2019 11:07 PM
You can acheive this case by using MergeContent processor
Merge Content Configs:
By configuring Minimum Number of Entries to 2 processor will wait until it got 2 entries.
Flow that i tried:
But if we got 2 flowfiles from Location1 it self merge content is going to merge those flowfiles into 1.
This flow only works when we are going to have one flowfile from each source then it works fine, if you haven't got any flowfile from location2 then processor just wait infinite time until it gets another flowfile.
To avoid this case use reasonable Max bin age time for your use case then processor will forcefully keeps the flow file into merged relationship.
Please refer to this link for configuring MergeContent processor.
(or)
If your header is always same:
1.With new record oriented processor capabilities you can ignore the header that is coming from Location1 and configure the ConvertRecord processor to add the header to the incoming data.
2.Using Replace text processor we can add the header from to the Location2 file.
Refer to this link for more details regards to this method.
Created on 09-13-2018 09:01 PM - edited 08-17-2019 11:07 PM
For this case using EnforceOrder processor we can achieve your required file.
FLow:
In this flow EnforceOrder processor enforce to get header flowfile first then actual data flowfile and using MergeContent processor we are merging them into one.
Change the Wait Timeout property value in EnforceOrder processor as pre your requirement.
I have attached the template xml in this thread,you can keep as reference for your flow.
Created on 09-13-2018 09:36 PM - edited 08-17-2019 11:07 PM
I think issue is with Order Attribute as this property doesn't accept expression language so use attribute name without expression language.
Make sure you are having enforce order configs as
Change the Wait Timeout property value in EnforceOrder processor as pre your requirement.
Created 09-13-2018 06:38 PM
Thanks, the merged worked but header came as a last row.
How to prioritize the header FlowFile to come top as data header (as column names)
Created on 09-13-2018 09:14 PM - edited 08-17-2019 11:07 PM
Followed and implemented but getting the error at the same place where you are showing in your snapshot. Kindly advice how to fix.
UpdateAttribute_Header
UpdateAttribute_Data
Created 09-13-2018 10:25 PM
Many thanks, it worked.