03-04-2016 01:44 PM
Hive select query when snapshot is set returning two rows with same primary key for some records.
Same query without setting the snapshot is returning single row.
And snapshot scan did not return duplicate records.
Step1. Create snapshot from Existing HBase table
Step2: Set the Sanpshot and Query the External Hive table for the above HBase table.
returned results two rows with same primary key.
Step3: Query on the sam external table without setting the snapshot returned results with single row.
Scanned the HBase snapshot it has only one entry with that key.
Issue: query on external hive table with snapshot set returned two rows with same primary key.
11-23-2017 03:45 PM
Hi. I had the same problem and what that I get. Take a look at your HBase table varsion parameter.
hbase shell> describe '<your_table_name>'
If the VERSION property is differ from 1 you could have a duplicate rows if data is changed.
This is not a bug but a feature =) It safe data persistence.
Hive can elliminate duplicates only if you insert rows by hive
But if you update hbase cells by flume for example, HBase save it with different timestamp as version of your data and you get duplicates via hive select.
Hope it helps