Reply
New Contributor
Posts: 1
Registered: ‎03-04-2016

Hive returned duplicate records with HBase snapshot

Hi,

 

 

Hive select query when snapshot is set returning two rows with same primary key for some records.

Same query without setting the snapshot is returning single row.

And snapshot scan did not return duplicate records.

 

Step1. Create snapshot from Existing HBase table 

Step2: Set the Sanpshot and Query the External Hive table for the above HBase table.

 

returned results two rows with same primary key.

 

Step3: Query on the sam external table without setting the snapshot returned results with single row.

 

Scanned the HBase snapshot it has only one entry with that key.

 

Issue:    query on external hive table with snapshot set returned two rows with same primary key.  

 

 

 

 

New Contributor
Posts: 5
Registered: ‎12-02-2015

Re: Hive returned duplicate records with HBase snapshot

Did you solve it ? If yes, can you share ?

Highlighted
New Contributor
Posts: 1
Registered: ‎11-23-2017

Re: Hive returned duplicate records with HBase snapshot

Hi. I had the same problem and what that I get. Take a look at your HBase table varsion parameter. 

hbase shell> describe '<your_table_name>'

 

If the VERSION property is differ from 1 you could have a duplicate rows if data is changed.

This is not a bug but a feature =) It safe data persistence.

 

Hive can elliminate duplicates only if you insert rows by hive

But if you update hbase cells by flume for example, HBase save it with different timestamp as version of your data and you get duplicates via hive select.

 

Hope it helps 

Announcements