<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Consuming Kafka, each Json Messages  and write to HDFS as one file? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161751#M124130</link>
    <description>&lt;P&gt;The "defragment" merge strategy can only be used to Merge files that have very specific attributes assigned to them.  That strategy is typically used to reassemble a FlowFile that was previously split apart by NiFi.&lt;/P&gt;</description>
    <pubDate>Mon, 06 Feb 2017 21:07:10 GMT</pubDate>
    <dc:creator>MattWho</dc:creator>
    <dc:date>2017-02-06T21:07:10Z</dc:date>
    <item>
      <title>Consuming Kafka, each Json Messages  and write to HDFS as one file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161748#M124127</link>
      <description>&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/12151-mergecontent.png"&gt;mergecontent.png&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/12151-mergecontent.png"&gt;&lt;/A&gt;I'm new to NIFI. I'm trying to consumeKafka json message and write to hdfs in one file, also the filename should be date of the day.&lt;/P&gt;&lt;P&gt;I've tried to use mergecontent and write to hdfs, but it created multiple files.&lt;/P&gt;&lt;P&gt;This is my mergecontent looks like:&lt;/P&gt;</description>
      <pubDate>Mon, 06 Feb 2017 12:05:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161748#M124127</guid>
      <dc:creator>korvi_nareshkum</dc:creator>
      <dc:date>2017-02-06T12:05:45Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming Kafka, each Json Messages  and write to HDFS as one file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161749#M124128</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/15834/korvinareshkumar-1.html" nodeid="15834"&gt;@Naresh Kumar Korvi&lt;/A&gt;&lt;P&gt;Can you update your minimum number of entries to what you want at minimum, let's say 500 files. Also since it's all similar data going into one file, I am assuming flow file attributes are same. Can you change your merge strategy to defragment?&lt;/P&gt;&lt;P&gt;Finally I am your flow is something like this:&lt;/P&gt;&lt;P&gt;consumeKafka -&amp;gt; mergecontent -&amp;gt; putHDFS&lt;/P&gt;&lt;P&gt;Is that right?&lt;/P&gt;</description>
      <pubDate>Mon, 06 Feb 2017 13:16:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161749#M124128</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2017-02-06T13:16:05Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming Kafka, each Json Messages  and write to HDFS as one file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161750#M124129</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15834/korvinareshkumar-1.html" nodeid="15834" target="_blank"&gt;@Naresh Kumar Korvi&lt;/A&gt; You want it to look a bit like this:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Note the header, footer, and demarcator - this will aggregate your json records into a properly formatted document for later reading&lt;/LI&gt;&lt;LI&gt;Set a max bin age so the final few messages will not get stuck in the queue&lt;/LI&gt;&lt;LI&gt;Set a min size and min number of entires to stop lots of little files being written&lt;/LI&gt;&lt;LI&gt;Set a max size and max entries that generate a file of the size you want to work with&lt;/LI&gt;&lt;LI&gt;Play with the values a bit using the GenerateFlowfile processor to create appropriately sized content to test with if your Kafka dataflow is a bit slow.&lt;/LI&gt;&lt;LI&gt;Your flow should be ConsumeKafka -&amp;gt; MergeContent -&amp;gt; UpdateAttribute (set filename, path) -&amp;gt; PutHDFS&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="12165-screen-shot-2017-02-06-at-100615.png" style="width: 1534px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/20650i3E382C3AD12F5F6C/image-size/medium?v=v2&amp;amp;px=400" role="button" title="12165-screen-shot-2017-02-06-at-100615.png" alt="12165-screen-shot-2017-02-06-at-100615.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 11:50:38 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161750#M124129</guid>
      <dc:creator>dchaffey</dc:creator>
      <dc:date>2019-08-18T11:50:38Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming Kafka, each Json Messages  and write to HDFS as one file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161751#M124130</link>
      <description>&lt;P&gt;The "defragment" merge strategy can only be used to Merge files that have very specific attributes assigned to them.  That strategy is typically used to reassemble a FlowFile that was previously split apart by NiFi.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Feb 2017 21:07:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161751#M124130</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2017-02-06T21:07:10Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming Kafka, each Json Messages  and write to HDFS as one file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161752#M124131</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15834/korvinareshkumar-1.html" nodeid="15834" target="_blank"&gt;@Naresh Kumar Korvi&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You will want to stick with the "Bin-Packing Algorithm" merge strategy in your case.  The reason you are ending up with single files is because of the way the MergeContent processor is designed to work.  There are several factors in play here:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="12166-screen-shot-2017-02-06-at-81825-am.png" style="width: 802px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/20648iE0F621EC21035F99/image-size/medium?v=v2&amp;amp;px=400" role="button" title="12166-screen-shot-2017-02-06-at-81825-am.png" alt="12166-screen-shot-2017-02-06-at-81825-am.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;As the MergeContent processor will start the content of each new FlowFile on a new line.  However, at times the incoming content of each FlowFile may be multiple lines itself.  So it may be desirable to put a user defined "Demarcator" between the content of each FlowFile should you need to differentiate the content of each merge at a later time.  If that is the case, the MergeContent processor provides a "Demarcator" property to accomplish this.&lt;/P&gt;&lt;P&gt;An UpdateAttribute processor can be used following the MergeContent processor to set a new "filename" on the resulting merged FlowFile.  I am not sure the exact filename format you want to use, but here is an example config that produce a filename like "2017-02-06":&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="12167-screen-shot-2017-02-06-at-84448-am.png" style="width: 1578px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/20649i4319E4B13ABC8072/image-size/medium?v=v2&amp;amp;px=400" role="button" title="12167-screen-shot-2017-02-06-at-84448-am.png" alt="12167-screen-shot-2017-02-06-at-84448-am.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 11:50:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161752#M124131</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2019-08-18T11:50:31Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming Kafka, each Json Messages  and write to HDFS as one file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161753#M124132</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/525/mclark.html" nodeid="525"&gt;@Matt&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Thanks Matt, this is some kind of similar i'm looking at. but also how do i create dir based on date condition.&lt;/P&gt;&lt;P&gt;For Example: Based on date range it should create a dir dynamically.&lt;/P&gt;&lt;P&gt;this is what i'm expecting the dir structure to be:&lt;/P&gt;&lt;P&gt;period1-year/p1-week1/date/date.json&lt;/P&gt;&lt;P&gt;I'm not sure if i have the right condition in updateattribute.&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/12182-updateattrib-on-rules.png"&gt;updateattrib-on-rules.png&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2017 06:23:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161753#M124132</guid>
      <dc:creator>korvi_nareshkum</dc:creator>
      <dc:date>2017-02-07T06:23:37Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming Kafka, each Json Messages  and write to HDFS as one file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161754#M124133</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/2662/dchaffey.html" nodeid="2662"&gt;@Dan Chaffey&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Yes, my NIFI data flow has the same design.&lt;/P&gt;&lt;P&gt;Thanks Dan, i've the changes recommended. &lt;/P&gt;</description>
      <pubDate>Tue, 07 Feb 2017 06:27:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161754#M124133</guid>
      <dc:creator>korvi_nareshkum</dc:creator>
      <dc:date>2017-02-07T06:27:40Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming Kafka, each Json Messages  and write to HDFS as one file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161755#M124134</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15834/korvinareshkumar-1.html" nodeid="15834" target="_blank"&gt;@Naresh Kumar Korvi&lt;/A&gt; &lt;/P&gt;&lt;P&gt;The "Conditions" specified for your rule must result in a boolean "true" before the associated "Actions" will be applied against the incoming FlowFile.  Your condition you have in the screenshot will always resolve to true...&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="12196-screen-shot-2017-02-07-at-91722-am.png" style="width: 1164px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/20645i3DFA5B3A3E545CFA/image-size/medium?v=v2&amp;amp;px=400" role="button" title="12196-screen-shot-2017-02-07-at-91722-am.png" alt="12196-screen-shot-2017-02-07-at-91722-am.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Looking at your "dirname" attribute, it is not going to return your desired directory path of:&lt;/P&gt;&lt;PRE&gt;period1-year/p1-week1/date&lt;/PRE&gt;&lt;P&gt;and your "filename" attribute will be missing the .json extension you are looking for as well:&lt;/P&gt;&lt;PRE&gt;date.json&lt;/PRE&gt;&lt;P&gt;I believe what you are trying to do is better accomplished using the below "Condition" and "Action" configurations:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="12197-screen-shot-2017-02-07-at-92138-am.png" style="width: 1136px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/20646iE25BEA35A123619E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="12197-screen-shot-2017-02-07-at-92138-am.png" alt="12197-screen-shot-2017-02-07-at-92138-am.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Condition:   &lt;/P&gt;&lt;PRE&gt;${now():format('MM'):le(2):and(${now():format('dd'):le(25)})}&lt;/PRE&gt;&lt;P&gt;dirname:     &lt;/P&gt;&lt;PRE&gt;period1-${now():format('yyyy')}/p1-${now():format('ww')}/${now():format('MM-dd-yyyy')}&lt;/PRE&gt;&lt;P&gt;filename:    &lt;/P&gt;&lt;PRE&gt;${now():format('MM-dd-yyyy')}.json&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="12198-screen-shot-2017-02-07-at-92245-am.png" style="width: 335px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/20647i6C2F1056D4995C4A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="12198-screen-shot-2017-02-07-at-92245-am.png" alt="12198-screen-shot-2017-02-07-at-92245-am.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Matt&lt;/P&gt;</description>
      <pubDate>Sun, 18 Aug 2019 11:50:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161755#M124134</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2019-08-18T11:50:17Z</dc:date>
    </item>
    <item>
      <title>Re: Consuming Kafka, each Json Messages  and write to HDFS as one file?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161756#M124135</link>
      <description>&lt;P&gt;@Matt&lt;/P&gt;&lt;P&gt;Thanks, Matt this is something i was looking at.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2017 11:39:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Consuming-Kafka-each-Json-Messages-and-write-to-HDFS-as-one/m-p/161756#M124135</guid>
      <dc:creator>korvi_nareshkum</dc:creator>
      <dc:date>2017-02-10T11:39:52Z</dc:date>
    </item>
  </channel>
</rss>

