Support Questions

Find answers, ask questions, and share your expertise

HBASE only storing 3 versions

avatar
Super Collaborator

Hbase is not saving all the versions of imported data despite versions set to 40 ?

1- create table with cf versions 40

create 'PUR_ACCT_PHX','cf1'
alter 'PUR_ACCT_PHX',{NAME => 'cf1', VERSIONS => 40}
hbase(main):004:0> describe 'PUR_ACCT_PHX'
Table PUR_ACCT_PHX is ENABLED
PUR_ACCT_PHX
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '40', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPR
ESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

2- sqoop import table data consisting of 38 records

SQL> desc tab4
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ACCT_NUM                                  NOT NULL NUMBER(8)
 PUR_ID                                             NUMBER(10)
 PUR_DET_ID                                NOT NULL NUMBER(10)
 PRODUCT_PUR_PRODUCT_CODE                           VARCHAR2(2)
 PROD_AMT                                           NUMBER(10,2)
 PUR_TRANS_DATE                                     DATE
 ACCTTYPE_ACCT_TYPE_CODE                            VARCHAR2(2)
 ACCTSTAT_ACCT_STATUS_CODE                          VARCHAR2(2)
 EMP_EMP_CODE                                       CHAR(4)
 PLAZA_PLAZA_ID                                     CHAR(6)
 PURSTAT_PUR_STATUS_CODE                            CHAR(2)
SQL> select count(*) from tab4;
  COUNT(*)
----------
        38
sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --table PATRON.TAB4 --hbase-table PUR_ACCT_PHX --column-family cf1 --hbase-row-key "ACCT_NUM" -m 4

3- scan hbase for all 38 versions but shows only 3 values/column

68568-capture.jpg

10 REPLIES 10

avatar
Super Collaborator

i am not sure if this is your issue, but i found the query syntax to ask for more versions to be:

scan 'PUR_ACCT_PHX',{VERSIONS >=40}

you might want to try it.

avatar
Super Collaborator

no this command is not correct

hbase(main):023:0> scan 'PUR_ACCT_PHX',{VERSIONS >=40}
SyntaxError: (hbase):23: odd number list for Hash.scan 'PUR_ACCT_PHX',{VERSIONS >=40}

avatar
Super Collaborator

sorry, my response contained a typo, should have been:

scan 'PUR_ACCT_PHX',{VERSIONS=>40}

avatar
Super Collaborator

hi Harald your command gives the same output as mine with three versions only. there is something else going on here than the command syntax I think.

btw if I create any other table locally I can store more than 3 versions , so it has something to do with how sqoop is loading data into hbase

avatar
Master Guru
@Sami Ahmad

Could you please create new Hbase table(PUR_ACCT_PHX1) with cf1 as column family having 40 versions

create 'PUR_ACCT_PHX1',{NAME =>'cf1', VERSIONS =>40}

Then do your sqoop import to this table(PUR_ACCT_PHX1) and column family(cf1) name

sqoop import--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))"--username PATRON --table PATRON.TAB4 --hbase-table PUR_ACCT_PHX1 --column-family cf1 --hbase-row-key "ACCT_NUM"-m 4

Then do scan on the hbase table with the version 40

scan 'PUR_ACCT_PHX1',{VERSIONS=>40}

(or)
once you altered the versions of the hbase table to 40 then run sqoop import again to the same table then do scan table.
let us know are you able to see all versions (or) not.

avatar
Super Collaborator

no still I see 3 version.

can someone try to reproduce this on their end ?

hbase(main):001:0> scan 'PUR_ACCT_PHX1',{VERSIONS=>40}
ROW                               COLUMN+CELL
 1001181                          column=cf1:ACCTSTAT_ACCT_STATUS_CODE, timestamp=1524501493799, value=01
 1001181                          column=cf1:ACCTSTAT_ACCT_STATUS_CODE, timestamp=1524501436206, value=01
 1001181                          column=cf1:ACCTSTAT_ACCT_STATUS_CODE, timestamp=1524501358930, value=01
 1001181                          column=cf1:ACCTTYPE_ACCT_TYPE_CODE, timestamp=1524501493799, value=01
 1001181                          column=cf1:ACCTTYPE_ACCT_TYPE_CODE, timestamp=1524501436206, value=01
 1001181                          column=cf1:ACCTTYPE_ACCT_TYPE_CODE, timestamp=1524501358930, value=01
 1001181                          column=cf1:EMP_EMP_CODE, timestamp=1524501493799, value=9999
 1001181                          column=cf1:EMP_EMP_CODE, timestamp=1524501436206, value=9999
 1001181                          column=cf1:EMP_EMP_CODE, timestamp=1524501358930, value=9999

avatar
Master Guru
@Sami Ahmad

Unfortunately, i'm not able to reproduce the scenario in my end.
i have created countries table in hbase with 40 versions

Then ran sqoop import to this table with 1 mapper

sqoop import --connect '**' --username root--password 'root' --query "SELECT  '1' as CountryID,CountryCode from countries where \$CONDITIONS" -m 1   --hbase-table countries --column-family c --hbase-row-key CountryID

i'm assigning value '1' for each countryid and i'm having 241 records in counties table once the import got finished even though i'm having table with 40 versions hbase only showing the last record(that means sqoop import to hbase table will not store all the versions but from hbase shell it will store all the versions)

hbase> scan 'countries',{ VERSIONS =>40}
ROW                                             COLUMN+CELL
 1                                              column=c:CountryCode, timestamp=1525221297408, value=ZW

I ran the same sqoop import again to this table then i'm able to see 2 versions of as shown below.

hbase>scan 'countries',{ VERSIONS =>40}
ROW                                             COLUMN+CELL
 1                                              column=c:CountryCode, timestamp=1525221408212, value=ZW
 1                                              column=c:CountryCode, timestamp=1525221297408, value=ZW

for third run i'm able to see 3 versions

hbase> scan 'countries',{ VERSIONS =>40}
ROW                                             COLUMN+CELL
 1                                              column=c:CountryCode, timestamp=1525221470733, value=ZW
 1                                              column=c:CountryCode, timestamp=1525221408212, value=ZW
 1                                              column=c:CountryCode, timestamp=1525221297408, value=ZW

if i run same sqoop import for 40 times then i'm having 40 versions with same value as zw in hbase table.
My hbase version is 1.1.2.2.5.3.0-37 and sqoop 1.4.6.2.5.3.0-37,

For your case even for first import you are able to see 3 versions(i think because of -m 4 argument is doing 3 versions instead of 1).

Try this

hbase> disable 'PUR_ACCT_PHX1'
hbase> drop 'PUR_ACCT_PHX1'
hbase> create 'PUR_ACCT_PHX1',{NAME =>'cf1', VERSIONS =>40}

run sqoop import with 1 mapper

sqoop import--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))"--username PATRON --table PATRON.TAB4 --hbase-table PUR_ACCT_PHX1 --column-family cf1 --hbase-row-key "ACCT_NUM"-m 1
hbase(main):001:0> scan 'PUR_ACCT_PHX1',{VERSIONS=>40}

I think with this scan command you have to see only 1 version instead of 3 versions.

avatar
Super Collaborator

hi Shu you wrote

and you are right ..its importing only one record with m1 on each run . I tried 4 sqoop loads and got 4 versions.

that means sqoop import to hbase table will not store all the versions but from hbase shell it will store all the versions)

is it a bug or feature ? is Hortonworks aware of this and what is their comment ?

avatar
Super Collaborator

any suggestions on what I am doing wrong ?