Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hbase utilizing more storage space while loading data from oracle using sqoop

avatar
New Contributor

HI ,

 

Need your expertise to understand the utilization of storage space in hbase .

 

I am trying to load data from Oracle Table to Hbase directly using Sqoop by below command.

 

Source Table Size : 20 GB  .

 

sqoop import --connect 'jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=)(port=))(connect_data=(sid=ORA10G)))' --username  --password  --query "SELECT /*+ parallel(a,8) full(a) */ * FROM TEST a  WHERE \$CONDITIONS"  -m 10  --hbase-create-table --hbase-table TEST_HB --column-family cf1 --hbase-row-key IDNUM  --hive-drop-import-delims --split-by PARTITION_ID

 

Facing two below issues .

 

1) its starting 10 Mappers , but only 3 were in Running status and remaining as scheduled . Basically only 3 were running .  Do we have some parameters that limits this mappers while loading in  hbase ?

 

2)  HDFS storage getting filled more than 180 GB for 20GB worth oracle database . It should not be more than 60 GB worth ( considering 3 replication factor ) .  when i checked i physicals block files in hdfs for this data , all rows are storing with column names . How to avoid this overhead of column names or iam missing something in above sqoop command .

 

 

1 ACCEPTED SOLUTION

avatar
Mentor
You can use the FAST_DIFF encoding to perhaps, in one way, reduce the serialisation cost in HBase: http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#data.block.encoding.enable

Also consider compressing your table - it will save a lot of space if you also make sure to use a proper HFile data block size (not the same as HDFS block size).

View solution in original post

2 REPLIES 2

avatar
Mentor
You can use the FAST_DIFF encoding to perhaps, in one way, reduce the serialisation cost in HBase: http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#data.block.encoding.enable

Also consider compressing your table - it will save a lot of space if you also make sure to use a proper HFile data block size (not the same as HDFS block size).

avatar
New Contributor

Thanks . Block encoding and compression together helped to storage utilization.