Thanks. If i not use Window and choose to use Streaming the data on to HDFS, could you suggest how to only store 1 week worth of data. Should i create a cron job to delete HDFS files older than a week. PLease let me know if you have any other suggestions
Doesn't seem like streaming data directly to HDFS will make it very easy to find/aggregate at the end of each window? What about creating a key/value store (with reddis, hbase, or elasticSearch for example) and using it to lookup all the keys associated with each window.