Support Questions

Find answers, ask questions, and share your expertise

Hbase utilizing more storage space while loading data from oracle using sqoop

avatar
New Contributor

HI ,

 

Need your expertise to understand the utilization of storage space in hbase .

 

I am trying to load data from Oracle Table to Hbase directly using Sqoop by below command.

 

Source Table Size : 20 GB  .

 

sqoop import --connect 'jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=)(port=))(connect_data=(sid=ORA10G)))' --username  --password  --query "SELECT /*+ parallel(a,8) full(a) */ * FROM TEST a  WHERE \$CONDITIONS"  -m 10  --hbase-create-table --hbase-table TEST_HB --column-family cf1 --hbase-row-key IDNUM  --hive-drop-import-delims --split-by PARTITION_ID

 

Facing two below issues .

 

1) its starting 10 Mappers , but only 3 were in Running status and remaining as scheduled . Basically only 3 were running .  Do we have some parameters that limits this mappers while loading in  hbase ?

 

2)  HDFS storage getting filled more than 180 GB for 20GB worth oracle database . It should not be more than 60 GB worth ( considering 3 replication factor ) .  when i checked i physicals block files in hdfs for this data , all rows are storing with column names . How to avoid this overhead of column names or iam missing something in above sqoop command .

 

 

1 ACCEPTED SOLUTION

avatar
Mentor
You can use the FAST_DIFF encoding to perhaps, in one way, reduce the serialisation cost in HBase: http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#data.block.encoding.enable

Also consider compressing your table - it will save a lot of space if you also make sure to use a proper HFile data block size (not the same as HDFS block size).

View solution in original post

2 REPLIES 2

avatar
Mentor
You can use the FAST_DIFF encoding to perhaps, in one way, reduce the serialisation cost in HBase: http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#data.block.encoding.enable

Also consider compressing your table - it will save a lot of space if you also make sure to use a proper HFile data block size (not the same as HDFS block size).

avatar
New Contributor

Thanks . Block encoding and compression together helped to storage utilization.