Created 07-23-2016 04:04 PM
I want to write data to new file if file not exist and append data to existing file using storm hdfs connector HdfsBolt. May I know how to do this ? Appreciate for any suggestions.
Created 07-23-2016 04:14 PM
File on HDFS are immutable. Hdfs bolt allows for example "After every 1,000 tuples it will sync filesystem, making that data visible to other HDFS clients. It will rotate files when they reach 5 megabytes in size."
So you can buffer up events until specified interval. Take a look at my github storm code. You will see how that is performed
Created 07-23-2016 04:14 PM
File on HDFS are immutable. Hdfs bolt allows for example "After every 1,000 tuples it will sync filesystem, making that data visible to other HDFS clients. It will rotate files when they reach 5 megabytes in size."
So you can buffer up events until specified interval. Take a look at my github storm code. You will see how that is performed
Created 07-23-2016 11:37 PM
Appreciate for the advice between the file name will be named something like ddmmyyyy-hh. I want to group the log by hourly and the event per second can be changed, so the number of tuples and file side cannot be determined. In this case how to do it?
Created 07-23-2016 11:49 PM
So your looking for windowing on storm.ie do somethikg based on a specificed time period. Until recently you had to build your own windowing logic in storm by keep track of time and do some disk cache to hold events until window tome has completed. Now the functionality comes out of the box. Take a look at an excellent article written on how the new functionality works in storm here. https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.h...
Created 07-24-2016 02:05 AM
Thank you for the advice,the new feature of windowing seem able to solve my problem but only concern is the in memory capability to hold 1 hour data , may i know any example for how to configure /do the disk cache ?
Between i found out some example of doing append in hdfs
http://stackoverflow.com/questions/32339602/append-to-file-in-hdfs-cdh-5-4-5
the CDH platform can do the append ?
Created 07-24-2016 05:29 AM
If you are concerned about memory you can persist the data to hdfs and once the window period is over recombine all persisted data and push to your hour location.
Created 07-24-2016 06:58 AM
Thank you for the answer