Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

A Two Phase Commit for Hbase & NiFi

avatar

How would you perform a two phase commit between HBase and NiFi? Think of a trading system in FinServ. Once a piece of data in transacted (i.e. committed) in HBase (assume Omid / Tephra here), how can a push mechanism get that data into NiFi, and then NiFi can acknowledge that it received the data from HBase?

1 ACCEPTED SOLUTION

avatar
Master Guru

You would need something running in HBase that could push the data to a processor in NiFi, and would re-try if an acknowledgement was never received from NiFi. The processor in NiFi would only send an acknowledgement after writing the data to a flow file and calling session.commit(). Don't know enough about HBase to say how or if you can implement that. The current integration between NiFi and HBase is a GetHBase processor that scans for new cells based on a timestamp, and PutHBaseCell and PutHBaseJSON for inserting data.

View solution in original post

7 REPLIES 7

avatar
Master Guru

@ccasano trying to understand the use case. Are you looking or ack when data is recieved by NiFi and second when data is push into hbase? sorry didn't follow your question completely.

avatar
Master Guru

You would need something running in HBase that could push the data to a processor in NiFi, and would re-try if an acknowledgement was never received from NiFi. The processor in NiFi would only send an acknowledgement after writing the data to a flow file and calling session.commit(). Don't know enough about HBase to say how or if you can implement that. The current integration between NiFi and HBase is a GetHBase processor that scans for new cells based on a timestamp, and PutHBaseCell and PutHBaseJSON for inserting data.

avatar
Master Mentor

I'm thinking a coprocessor on HBase watching for data coming from nifi? @Enis @vrodionov @wxu @carter @nmaillard this is an FS use case in NYC.

avatar

@Bryan Bende @Artem Ervits this is helpful, I think we could be onto something. For a coprocessor, would it make sense to emit to REST call to get the transaction to NiFi as opposed to having NiFi doing constant Gets? Not too familiar with HBase but co-processors reminds me of triggers which can be useful but slippery. For the two phase commit, I believe the NiFi processor that would receive the "triggered" data would then have to ACK which HBase before transmitting further down the flow.

avatar
New Contributor

The requirement is to have HBASE notify (PUSH) to NiFi when a piece of data has been changed, rather than use the polling mechanism currently implemented in NiFi.

Client has implemented a custom in-house solution using HBASE coprocessor, but would like to replace this code with NiFi. Polling does not work well in an environment when data can change rapidly or when data is updated at irregular intervals.

Ideal solution would be to "publish-and forget" an update via Kafka (or another method) in a manner which does not stop HBASE processing like a tigger would or has an impact of performance of HBASE.

avatar
Master Guru

There was some discussion a long time ago about using HBase's replication end-point to possibly push data to NiFi, but at the time it wasn't something that was needed. You can dig through the comment trail here for more info: https://issues.apache.org/jira/browse/NIFI-817 starting with nicolas maillard added a comment - 21/Sep/15 13:15

avatar
Master Guru

Option 1) It looks like you can write your own custom processor that does this: http://omid.incubator.apache.org/quickstart.html using their library.

Option 2) Or if you didn't want to add a custom processor you could have Spark, Flink or Storm program make the Omid client call and push to NiFi with Site-to-Site or Kafka and check. You must check to for failures and implement retry

Option 3) Tephra is used by the Apache Phoenix as well to add cross-row and cross-table transaction support with full ACID semantics. So use JDBC Connection from NIFI to get the data.

Option 4) CQRS / Event Sourcing instead of old style 2 Phase Commit which has heavy overhead and limits scalability.

Option 5) http://trafodion.apache.org/faq.html with NiFi

Option 6) Look at some HBase stuff: http://www.slideshare.net/HBaseCon/operations-session-6-49043532