Support Questions

Find answers, ask questions, and share your expertise

Row counts - date lake

Explorer

Is there a recommended way to ensure the row counts form tables in source (Oracle) are consistent with that of target tables in Hbase ( data-lake)? .We are using Nifi which receives the golden gate messages and then by using different processor we store the transactions in Hbase ,so essentially the tables in Hbase should be in sync with the tables in Oracle at all times. I am interested in knowing how the teams ensure and proof this ? Do they take row counts from source and target everyday and match it and say that its synced ? I used the counter option in Nifi which maintained the record received against each table but i guess that is not an optimized way to do it.

1 REPLY 1

Super Guru

I am sure there are many ways to skin this cat. Here are a few ideas

-Use core base rowCounter

org.apache.hadoop.hbase.mapreduce.RowCounter

Usage: RowCounter [options] 
    <tablename> [          
        --starttime=[start] 
        --endtime=[end] 
        [--range=[startKey],[endKey]] 
        [<column1> <column2>...]
    ]

-Store NiFi counter in HBase table

-Use hbase coprocessor. This will give you max flexibility. Think of database triggers. Some logic you want to executor per record. Be careful with performance. If you are looking to super fast write times, the triggers may have some side effect