I have stream of data coming in throughout the day. I need to perform ETL on this data. ETL requires looking up reference data into HBase table. The data in HBase table can range from 4 to 10 Billion records, with each record being 100 Byte in size.
I am looking for suggestions (& examples) on technologies that I should be using here. Would Spark Streaming along with having DataFrame over HBase for lookup be a good choice here?