About ramgood

ramgood · ‎09-26-2018

Hi I have a huge file which is more than 100 GB. It has tab delimited values. Below is the sample data. Location ID Device ID Timestamp Date Time Day of Week Germany|3345204 997271322a5f54baa57a29b96d04231b0b069b31 1533473417 2018-08-05 14:50:17 Sun Germany|3345204 997271322a5f54baa57a29b96d04231b0b069b31 1533473434 2018-08-05 14:50:34 Sun Germany|3345204 ef7f1af6e29c8ad562e87b785685bfb2f79adb4a 1533427210 2018-08-05 02:00:10 Sun Germany|3345204 64e1884666d73d30f3c8ed0f5ee9054ea6318121 1533508209 2018-08-06 00:30:09 Mon Germany|3345204 64e1884666d73d30f3c8ed0f5ee9054ea6318121 1533508272 2018-08-06 00:31:12 Mon Germany|3345204 64e1884666d73d30f3c8ed0f5ee9054ea6318121 1533508273 2018-08-06 00:31:13 Mon I am quite new to nifi. Struggling hard to understand expression language and storing values into variables, tab delimiter, etc. I want to split the file into multiple files such that one file for each day. For example, from above data, one file for "2018-08-05" and one for "2018-08-06". Note that i don't know the date. Date values are coming in runtime, from the line. So, when the file processing starts, we pick the first date of occurance and store it in memory, create a file for this date and add the line in the file. And subsequently when we encounter the same date, the line should be added to respective file. Though I have long explanation, I know it is a common need. But, I am not able to create a flow for this due to my limited knowledge. Can anybody help me with a sample flow / template? It will help me in getting started. Thanks

Online	Offline
Last Visited	‎10-03-2018 05:20 AM

Member Since	‎09-26-2018 12:28 PM
Last Visited	‎10-03-2018 05:20 AM
Posts	1

Cloudera Community

Split huge file, one file for each day - based on ...