<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Splitting a Nifi flowfile into multiple flowfiles in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139934#M44001</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/14536/melvin-camendoza46.html" nodeid="14536"&gt;@mel mendoza&lt;/A&gt;, in my case, after splitting the files, I was doing further processing on the split files; but if your requirement is to store/write the split files, you could use PutFile or PutHDFS to write to local file system or HDFS.&lt;/P&gt;</description>
    <pubDate>Wed, 09 Aug 2017 23:46:17 GMT</pubDate>
    <dc:creator>Raj_B</dc:creator>
    <dc:date>2017-08-09T23:46:17Z</dc:date>
    <item>
      <title>Splitting a Nifi flowfile into multiple flowfiles</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139930#M43997</link>
      <description>&lt;P&gt;
	Hi All, &lt;/P&gt;&lt;P&gt;
	I have the following requirement: &lt;/P&gt;&lt;P&gt;
	Split a single NiFi flowfile into multiple flowfiles, eventually to insert the contents (after extracting the contents from the flowfile) of each of the flowfiles as a separate row in a Hive table. &lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;	&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;Sample input flowfile: &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	MESSAGE_HEADER | A | B | C &lt;/P&gt;&lt;P&gt;
LINE|1 | ABCD | 1234 &lt;/P&gt;&lt;P&gt;
LINE|2 | DEFG | 5678 &lt;/P&gt;&lt;P&gt;
LINE|3 | HIJK | 9012&lt;/P&gt;&lt;P&gt;
. &lt;/P&gt;&lt;P&gt;
	. &lt;/P&gt;&lt;P&gt;
	. &lt;/P&gt;&lt;P&gt;
	&lt;STRONG&gt;Desired output files:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;
	&lt;EM&gt;Flowfile 1: &lt;/EM&gt;&lt;/P&gt;&lt;P&gt;
	MESSAGE_HEADER | A | B | C &lt;/P&gt;&lt;P&gt;
LINE|1 | ABCD | 1234 &lt;/P&gt;&lt;P&gt;
	&lt;EM&gt;Flowfile 2:&lt;/EM&gt; &lt;/P&gt;&lt;P&gt;
	MESSAGE_HEADER | A | B | C &lt;/P&gt;&lt;P&gt;
LINE|2 | DEFG | 5678 &lt;/P&gt;&lt;P&gt;
	&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;
	&lt;EM&gt;
	Flowfile 3&lt;/EM&gt;: &lt;/P&gt;&lt;P&gt;
	MESSAGE_HEADER | A | B | C &lt;/P&gt;&lt;P&gt;
LINE|3 | HIJK | 9012 &lt;/P&gt;&lt;P&gt;
. &lt;/P&gt;&lt;P&gt;
	. &lt;/P&gt;&lt;P&gt;
	. &lt;/P&gt;&lt;P&gt;
	The number of lines in the flowfile is not known ahead of time.&lt;/P&gt;&lt;P&gt;
	I would like to know what's the best way to accomplish this with the different NiFi processors that are available; 
The splitting can be done at the flowfile level or after the contents of the flowfile are extracted out of the flowfile, but before Hive insert statements are created. &lt;/P&gt;&lt;P&gt;
	Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2016 03:19:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139930#M43997</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2016-10-20T03:19:23Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a Nifi flowfile into multiple flowfiles</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139931#M43998</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10100/rbolla.html" nodeid="10100"&gt;@Raj B&lt;/A&gt; The SplitText processor has a "Header Line Count" property. If you set this to 1, you should be able to achieve what you want in generating multiple flow files, each with the same header. That said, if you're intending to insert these into Hive, you could actually use ConvertCSVToAvro too, setting the delimiter to '|' and then you'd have the data in batches which should give you better throughput.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2016 03:35:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139931#M43998</guid>
      <dc:creator>jfrazee</dc:creator>
      <dc:date>2016-10-20T03:35:42Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a Nifi flowfile into multiple flowfiles</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139932#M43999</link>
      <description>&lt;P&gt;@jfrazee Thank you; I'm going the SplitText route for now, it seems to work; &lt;/P&gt;&lt;P&gt;for the purposes of saving the split files, for later reference, how do I assign different names (I'm thinking may be pre or postpend UUID to the file name) to the child/split flowfiles; when I looked at it, all of the child files are getting the same name as the parent flowfile, which is causing child flowfiles to be overwritten.&lt;A href="https://community.hortonworks.com/users/2956/jfrazee.html"&gt;&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2016 10:12:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139932#M43999</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2016-10-20T10:12:27Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a Nifi flowfile into multiple flowfiles</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139933#M44000</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2956/jfrazee.html" nodeid="2956"&gt;@jfrazee&lt;/A&gt; &lt;A rel="user" href="https://community.cloudera.com/users/10100/rbolla.html" nodeid="10100"&gt;@Raj B
&lt;/A&gt;&lt;/P&gt;&lt;P&gt;how did you save it in file? Getfile -&amp;gt; splitText -&amp;gt; PutFile ?
&lt;A rel="user" href="https://community.cloudera.com/users/10100/rbolla.html" nodeid="10100"&gt;&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Wed, 12 Jul 2017 14:15:50 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139933#M44000</guid>
      <dc:creator>melvinmendoza</dc:creator>
      <dc:date>2017-07-12T14:15:50Z</dc:date>
    </item>
    <item>
      <title>Re: Splitting a Nifi flowfile into multiple flowfiles</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139934#M44001</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/14536/melvin-camendoza46.html" nodeid="14536"&gt;@mel mendoza&lt;/A&gt;, in my case, after splitting the files, I was doing further processing on the split files; but if your requirement is to store/write the split files, you could use PutFile or PutHDFS to write to local file system or HDFS.&lt;/P&gt;</description>
      <pubDate>Wed, 09 Aug 2017 23:46:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Splitting-a-Nifi-flowfile-into-multiple-flowfiles/m-p/139934#M44001</guid>
      <dc:creator>Raj_B</dc:creator>
      <dc:date>2017-08-09T23:46:17Z</dc:date>
    </item>
  </channel>
</rss>

