Looking for some advice/guidance on designing an architecture solution for storing data in HBase.
Our current flow is this. NiFi -> Kafka -> Storm -> HBase.
This is working as expected, but as we receive more requirements, we have a need to be more flexible. Our HBase store is now going to be used to store a lot more information from different aspects of our company, requiring more HBase tables as we receive new requirements. I was looking into ways of designing a solution where we can create a generic Storm topology, which would take in the table name and other data from Kafka at run time, allowing us to dynamically pass in any data/table/column family. Our Storm topology main responsibility would then be to simply parse the input, and write to the table name it received as part of the Tuple message. However, I believe this is not advised as the HBase Bolt requires you to the pass in the table name in the prepare() method, which would not allow for this flexible solution I am after. Anyone have any other tools/ideas for this? Currently we would have to have a HBase topology, and any time we added a new table to HBase, update that topology with a new HBase Bolt. This is not the end of the world, and probably what we will go with if we don't find another way of doing it, but just seeing what else is out there.
Some requirements we are hoping to achieve:
1. A single point of entry to write to HBase. Means only one component needs maintenance/updating when versions need to be updated. Also provides other benefits (easier authorization to write data, audits etc.)
2. Separating data into 2 streams:
a. Raw data that needs be simply archived in HBase, no processing required on the data
b. Data that needs to go through some form of processing. We will be using Spark for a lot of this. The processed data would then be stored to HBase by the same archival solution.
I have looked into using NiFi, but I would prefer to use NiFi simply as our data ingestion/routing/model transformation tool, and keep writing data separate to another tool. NiFi could become unmanageable, as we added more and more tables and process groups. Spark might do it but it just seems overkill.
Any other guidance?