<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Merge files Based on file headers.? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-files-Based-on-file-headers/m-p/125036#M39100</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/11732/saikrishnatarapareddy.html" nodeid="11732"&gt;@Saikrishna Tarapareddy&lt;/A&gt;&lt;P&gt;Your Regex above says the CSV file content must start with Tagname,Timestamp,Value,Quality,QualityDetail,PercentGood
&lt;/P&gt;&lt;P&gt;So, it should not route to "Header" unless the CSV starts with that.  What is found later in the CSV file should not matter.  I tried this and it seems to work as expected. If i removed the '^', then all files matched.&lt;/P&gt;&lt;P&gt;Your processor is also loading 1 MB worth of the CSV content for evaluation; however, the string you are searching for is far fewer bytes.  If you only want to match against the first line, reduce the size of the buffer from '1 MB' to maybe '60 b'.  If I changed the buffer to '60 b' and removed the '^' from the regex above, only the files with the matching header were routed to "header". 

Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
    <pubDate>Tue, 30 Aug 2016 04:01:19 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2016-08-30T04:01:19Z</dc:date>
    <item>
      <title>Merge files Based on file headers.?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-files-Based-on-file-headers/m-p/125033#M39097</link>
      <description>&lt;P&gt;Hi, I need to merge contents based on .CSV file headers. Lets say if i have 10 files in a folder and 5 of them with same header Name,Age,Gender.I want to merge all those 5 together and send rest to failures. How can i do that.?&lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2016 01:36:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-files-Based-on-file-headers/m-p/125033#M39097</guid>
      <dc:creator>saikrishna_tara</dc:creator>
      <dc:date>2016-08-30T01:36:41Z</dc:date>
    </item>
    <item>
      <title>Re: Merge files Based on file headers.?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-files-Based-on-file-headers/m-p/125034#M39098</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/11732/saikrishnatarapareddy.html" nodeid="11732"&gt;@Saikrishna Tarapareddy&lt;/A&gt;&lt;P&gt;The mergeContent processor is not designed to look at the content of the NiFi FlowFiles it is merging.  What you will want to do first is use a RouteOnContent processor to route only those Flowfiles where Content contains the headers you want to merge.  The 'unmatched' FlowFiles could then be routed elsewhere or auto-terminated.  

Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2016 01:47:36 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-files-Based-on-file-headers/m-p/125034#M39098</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2016-08-30T01:47:36Z</dc:date>
    </item>
    <item>
      <title>Re: Merge files Based on file headers.?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-files-Based-on-file-headers/m-p/125035#M39099</link>
      <description>&lt;P&gt;@mclark,&lt;/P&gt;&lt;P&gt;Ok , but RouteOnContent checks for the string in the whole file. where as i want to compare only the firstline .&lt;/P&gt;&lt;P&gt;if i have my RouteOnContent like below..it would route files to "Header" even if the data satisfies the RegEx.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="7063-roc.png" style="width: 950px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/22696i13F52FBF447A5B0D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="7063-roc.png" alt="7063-roc.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Aug 2019 10:11:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-files-Based-on-file-headers/m-p/125035#M39099</guid>
      <dc:creator>saikrishna_tara</dc:creator>
      <dc:date>2019-08-19T10:11:28Z</dc:date>
    </item>
    <item>
      <title>Re: Merge files Based on file headers.?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-files-Based-on-file-headers/m-p/125036#M39100</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/11732/saikrishnatarapareddy.html" nodeid="11732"&gt;@Saikrishna Tarapareddy&lt;/A&gt;&lt;P&gt;Your Regex above says the CSV file content must start with Tagname,Timestamp,Value,Quality,QualityDetail,PercentGood
&lt;/P&gt;&lt;P&gt;So, it should not route to "Header" unless the CSV starts with that.  What is found later in the CSV file should not matter.  I tried this and it seems to work as expected. If i removed the '^', then all files matched.&lt;/P&gt;&lt;P&gt;Your processor is also loading 1 MB worth of the CSV content for evaluation; however, the string you are searching for is far fewer bytes.  If you only want to match against the first line, reduce the size of the buffer from '1 MB' to maybe '60 b'.  If I changed the buffer to '60 b' and removed the '^' from the regex above, only the files with the matching header were routed to "header". 

Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2016 04:01:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Merge-files-Based-on-file-headers/m-p/125036#M39100</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2016-08-30T04:01:19Z</dc:date>
    </item>
  </channel>
</rss>

