Support Questions
Find answers, ask questions, and share your expertise

Spark Streaming and HBase Lookup

I have stream of data coming in throughout the day. I need to perform ETL on this data. ETL requires looking up reference data into HBase table. The data in HBase table can range from 4 to 10 Billion records, with each record being 100 Byte in size.

I am looking for suggestions (& examples) on technologies that I should be using here. Would Spark Streaming along with having DataFrame over HBase for lookup be a good choice here?