Support Questions

Ashokevarma · ‎07-28-2015

HI ,

Need your expertise to understand the utilization of storage space in hbase .

I am trying to load data from Oracle Table to Hbase directly using Sqoop by below command.

Source Table Size : 20 GB .

sqoop import --connect 'jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=)(port=))(connect_data=(sid=ORA10G)))' --username --password --query "SELECT /*+ parallel(a,8) full(a) */ * FROM TEST a WHERE \$CONDITIONS" -m 10 --hbase-create-table --hbase-table TEST_HB --column-family cf1 --hbase-row-key IDNUM --hive-drop-import-delims --split-by PARTITION_ID

Facing two below issues .

1) its starting 10 Mappers , but only 3 were in Running status and remaining as scheduled . Basically only 3 were running . Do we have some parameters that limits this mappers while loading in hbase ?

2) HDFS storage getting filled more than 180 GB for 20GB worth oracle database . It should not be more than 60 GB worth ( considering 3 replication factor ) . when i checked i physicals block files in hdfs for this data , all rows are storing with column names . How to avoid this overhead of column names or iam missing something in above sqoop command .

Harsh J · ‎08-25-2015

You can use the FAST_DIFF encoding to perhaps, in one way, reduce the serialisation cost in HBase: http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#data.block.encoding.enable

Also consider compressing your table - it will save a lot of space if you also make sure to use a proper HFile data block size (not the same as HDFS block size).

View solution in original post

Harsh J · ‎08-25-2015

You can use the FAST_DIFF encoding to perhaps, in one way, reduce the serialisation cost in HBase: http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#data.block.encoding.enable

Also consider compressing your table - it will save a lot of space if you also make sure to use a proper HFile data block size (not the same as HDFS block size).

Ashokevarma · ‎08-26-2015

Thanks . Block encoding and compression together helped to storage utilization.

Cloudera Community

Support Questions

Hbase utilizing more storage space while loading data from oracle using sqoop