Created on 04-20-2018 02:35 PM - edited 08-17-2019 06:40 PM
Hbase is not saving all the versions of imported data despite versions set to 40 ?
1- create table with cf versions 40
create 'PUR_ACCT_PHX','cf1'
alter 'PUR_ACCT_PHX',{NAME => 'cf1', VERSIONS => 40}
hbase(main):004:0> describe 'PUR_ACCT_PHX' Table PUR_ACCT_PHX is ENABLED PUR_ACCT_PHX COLUMN FAMILIES DESCRIPTION {NAME => 'cf1', BLOOMFILTER => 'ROW', VERSIONS => '40', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPR ESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2- sqoop import table data consisting of 38 records
SQL> desc tab4 Name Null? Type ----------------------------------------- -------- ---------------------------- ACCT_NUM NOT NULL NUMBER(8) PUR_ID NUMBER(10) PUR_DET_ID NOT NULL NUMBER(10) PRODUCT_PUR_PRODUCT_CODE VARCHAR2(2) PROD_AMT NUMBER(10,2) PUR_TRANS_DATE DATE ACCTTYPE_ACCT_TYPE_CODE VARCHAR2(2) ACCTSTAT_ACCT_STATUS_CODE VARCHAR2(2) EMP_EMP_CODE CHAR(4) PLAZA_PLAZA_ID CHAR(6) PURSTAT_PUR_STATUS_CODE CHAR(2)
SQL> select count(*) from tab4; COUNT(*) ---------- 38
sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))" --username PATRON --table PATRON.TAB4 --hbase-table PUR_ACCT_PHX --column-family cf1 --hbase-row-key "ACCT_NUM" -m 4
3- scan hbase for all 38 versions but shows only 3 values/column
Created 04-20-2018 03:14 PM
i am not sure if this is your issue, but i found the query syntax to ask for more versions to be:
scan 'PUR_ACCT_PHX',{VERSIONS >=40}
you might want to try it.
Created 04-20-2018 03:20 PM
no this command is not correct
hbase(main):023:0> scan 'PUR_ACCT_PHX',{VERSIONS >=40} SyntaxError: (hbase):23: odd number list for Hash.scan 'PUR_ACCT_PHX',{VERSIONS >=40}
Created 04-20-2018 03:55 PM
sorry, my response contained a typo, should have been:
scan 'PUR_ACCT_PHX',{VERSIONS=>40}
Created 04-20-2018 07:49 PM
hi Harald your command gives the same output as mine with three versions only. there is something else going on here than the command syntax I think.
btw if I create any other table locally I can store more than 3 versions , so it has something to do with how sqoop is loading data into hbase
Created 04-22-2018 04:41 AM
Could you please create new Hbase table(PUR_ACCT_PHX1) with cf1 as column family having 40 versions
create 'PUR_ACCT_PHX1',{NAME =>'cf1', VERSIONS =>40}
Then do your sqoop import to this table(PUR_ACCT_PHX1) and column family(cf1) name
sqoop import--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))"--username PATRON --table PATRON.TAB4 --hbase-table PUR_ACCT_PHX1 --column-family cf1 --hbase-row-key "ACCT_NUM"-m 4
Then do scan on the hbase table with the version 40
scan 'PUR_ACCT_PHX1',{VERSIONS=>40}
(or)
once you altered the versions of the hbase table to 40 then run sqoop import again to the same table then do scan table.
let us know are you able to see all versions (or) not.
Created 04-23-2018 04:39 PM
no still I see 3 version.
can someone try to reproduce this on their end ?
hbase(main):001:0> scan 'PUR_ACCT_PHX1',{VERSIONS=>40} ROW COLUMN+CELL 1001181 column=cf1:ACCTSTAT_ACCT_STATUS_CODE, timestamp=1524501493799, value=01 1001181 column=cf1:ACCTSTAT_ACCT_STATUS_CODE, timestamp=1524501436206, value=01 1001181 column=cf1:ACCTSTAT_ACCT_STATUS_CODE, timestamp=1524501358930, value=01 1001181 column=cf1:ACCTTYPE_ACCT_TYPE_CODE, timestamp=1524501493799, value=01 1001181 column=cf1:ACCTTYPE_ACCT_TYPE_CODE, timestamp=1524501436206, value=01 1001181 column=cf1:ACCTTYPE_ACCT_TYPE_CODE, timestamp=1524501358930, value=01 1001181 column=cf1:EMP_EMP_CODE, timestamp=1524501493799, value=9999 1001181 column=cf1:EMP_EMP_CODE, timestamp=1524501436206, value=9999 1001181 column=cf1:EMP_EMP_CODE, timestamp=1524501358930, value=9999
Created 05-02-2018 01:10 AM
Unfortunately, i'm not able to reproduce the scenario in my end.
i have created countries table in hbase with 40 versions
Then ran sqoop import to this table with 1 mapper
sqoop import --connect '**' --username root--password 'root' --query "SELECT '1' as CountryID,CountryCode from countries where \$CONDITIONS" -m 1 --hbase-table countries --column-family c --hbase-row-key CountryID
i'm assigning value '1' for each countryid and i'm having 241 records in counties table once the import got finished even though i'm having table with 40 versions hbase only showing the last record(that means sqoop import to hbase table will not store all the versions but from hbase shell it will store all the versions)
hbase> scan 'countries',{ VERSIONS =>40} ROW COLUMN+CELL 1 column=c:CountryCode, timestamp=1525221297408, value=ZW
I ran the same sqoop import again to this table then i'm able to see 2 versions of as shown below.
hbase>scan 'countries',{ VERSIONS =>40} ROW COLUMN+CELL 1 column=c:CountryCode, timestamp=1525221408212, value=ZW 1 column=c:CountryCode, timestamp=1525221297408, value=ZW
for third run i'm able to see 3 versions
hbase> scan 'countries',{ VERSIONS =>40} ROW COLUMN+CELL 1 column=c:CountryCode, timestamp=1525221470733, value=ZW 1 column=c:CountryCode, timestamp=1525221408212, value=ZW 1 column=c:CountryCode, timestamp=1525221297408, value=ZW
if i run same sqoop import for 40 times then i'm having 40 versions with same value as zw in hbase table.
My hbase version is 1.1.2.2.5.3.0-37 and sqoop 1.4.6.2.5.3.0-37,
For your case even for first import you are able to see 3 versions(i think because of -m 4 argument is doing 3 versions instead of 1).
Try this
hbase> disable 'PUR_ACCT_PHX1'
hbase> drop 'PUR_ACCT_PHX1'
hbase> create 'PUR_ACCT_PHX1',{NAME =>'cf1', VERSIONS =>40}
run sqoop import with 1 mapper
sqoop import--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=patronQA)(port=1526))(connect_data=(service_name=patron)))"--username PATRON --table PATRON.TAB4 --hbase-table PUR_ACCT_PHX1 --column-family cf1 --hbase-row-key "ACCT_NUM"-m 1
hbase(main):001:0> scan 'PUR_ACCT_PHX1',{VERSIONS=>40}
I think with this scan command you have to see only 1 version instead of 3 versions.
Created 05-02-2018 01:28 PM
hi Shu you wrote
and you are right ..its importing only one record with m1 on each run . I tried 4 sqoop loads and got 4 versions.
that means sqoop import to hbase table will not store all the versions but from hbase shell it will store all the versions)
is it a bug or feature ? is Hortonworks aware of this and what is their comment ?
Created 04-26-2018 08:20 PM
any suggestions on what I am doing wrong ?