<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Split huge file, one file for each day - based on date column - tab delimited in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Split-huge-file-one-file-for-each-day-based-on-date-column/m-p/203145#M165148</link>
    <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I have a huge file which is more than 100 GB. It has &lt;EM&gt;&lt;STRONG&gt;tab delimited&lt;/STRONG&gt;&lt;/EM&gt; values. Below is the sample data.&lt;/P&gt;&lt;P&gt;Location ID   Device ID   Timestamp   Date   Time   Day of Week&lt;BR /&gt;Germany|3345204   997271322a5f54baa57a29b96d04231b0b069b31   1533473417 &lt;STRONG&gt;  2018-08-05&lt;/STRONG&gt;   14:50:17   Sun&lt;BR /&gt;Germany|3345204   997271322a5f54baa57a29b96d04231b0b069b31   1533473434   2018-08-05   14:50:34   Sun&lt;BR /&gt;Germany|3345204   ef7f1af6e29c8ad562e87b785685bfb2f79adb4a   1533427210   2018-08-05   02:00:10   Sun&lt;BR /&gt;Germany|3345204   64e1884666d73d30f3c8ed0f5ee9054ea6318121   1533508209   &lt;STRONG&gt;2018-08-06&lt;/STRONG&gt;   00:30:09   Mon&lt;BR /&gt;Germany|3345204   64e1884666d73d30f3c8ed0f5ee9054ea6318121   1533508272   2018-08-06   00:31:12   Mon&lt;BR /&gt;Germany|3345204   64e1884666d73d30f3c8ed0f5ee9054ea6318121   1533508273   2018-08-06   00:31:13   Mon&lt;/P&gt;&lt;P&gt;I am quite new to nifi. Struggling hard to understand expression language and storing values into variables, tab delimiter, etc.&lt;/P&gt;&lt;P&gt;I want to split the file into multiple files such that one file for each day. For example, from above data, one file for "2018-08-05" and one for "2018-08-06". Note that i don't know the date. Date values are coming in runtime, from the line. So, when the file processing starts, we pick the first date of occurance and store it in memory, &lt;STRONG&gt;create a file for this date and add the line in the file&lt;/STRONG&gt;. And subsequently when we encounter the same date, the line should be added to respective file. Though I have long explanation, I know it is a common need. But, I am not able to create a flow for this due to my limited knowledge.&lt;/P&gt;&lt;P&gt;Can anybody help me with a sample flow / template? It will help me in getting started. Thanks&lt;/P&gt;</description>
    <pubDate>Thu, 27 Sep 2018 04:48:48 GMT</pubDate>
    <dc:creator>ramgood</dc:creator>
    <dc:date>2018-09-27T04:48:48Z</dc:date>
  </channel>
</rss>

