Created on 07-28-2015 09:27 PM - edited 09-16-2022 02:36 AM
HI ,
Need your expertise to understand the utilization of storage space in hbase .
I am trying to load data from Oracle Table to Hbase directly using Sqoop by below command.
Source Table Size : 20 GB .
sqoop import --connect 'jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=)(port=))(connect_data=(sid=ORA10G)))' --username --password --query "SELECT /*+ parallel(a,8) full(a) */ * FROM TEST a WHERE \$CONDITIONS" -m 10 --hbase-create-table --hbase-table TEST_HB --column-family cf1 --hbase-row-key IDNUM --hive-drop-import-delims --split-by PARTITION_ID
Facing two below issues .
1) its starting 10 Mappers , but only 3 were in Running status and remaining as scheduled . Basically only 3 were running . Do we have some parameters that limits this mappers while loading in hbase ?
2) HDFS storage getting filled more than 180 GB for 20GB worth oracle database . It should not be more than 60 GB worth ( considering 3 replication factor ) . when i checked i physicals block files in hdfs for this data , all rows are storing with column names . How to avoid this overhead of column names or iam missing something in above sqoop command .
Created 08-25-2015 11:50 PM
Created 08-25-2015 11:50 PM
Created 08-26-2015 12:05 AM
Thanks . Block encoding and compression together helped to storage utilization.