Reply
New Contributor
Posts: 2
Registered: ‎09-20-2018

Sqoop import to hive tables is very huge size

Hi All,

 

I am trying to export Oracle Tables to Hadoop hive using Sqoop Import. Oracle Table size is 32 MB.

I run sqoop import. 54 GB is free on datanode. While importing to hive the Datanode which is of 54 GB gets full everytime.

 

How come sqoop import to hive table from oracle is taking more 54 GB for 32 MB tables.

 

Please suggest.

Master
Posts: 377
Registered: ‎07-01-2015

Re: Sqoop import to hive tables is very huge size

Without knowing more details is hard to tell what is happening. But here are a few ideas:
- 54GB is very small space in terms of hadoop, but ok, it is not just for HDFS but also for YARN. So when you run your sqoop job, some files are created and intermediate data stored on NodeManagers (probably the same as DataNode). - Also note that HDFS still reserves some space from the overall disk space and never use 100%
- Regarding the table size, even if it looks small, I dont know the Oracle details, but it can be columnar and compressed. Now during the sqoop import every row is basically transferred and maybe stored as CSV. Also the deserialization takes some memory/space..

You should monitor the DN and NM directories during the import. And also check if nothing else is running on the cluster.
New Contributor
Posts: 2
Registered: ‎09-20-2018

Re: Sqoop import to hive tables is very huge size

Hi

Source Table size is 32MB in Oracle. When i try to import it to HDFS using Sqoop.

Sqoop is getting failed. No Space is available in any of the directories.

 

My Datanode is /u01/dfs is of 54GB.

For 32 MB, Sqoop import is taking more than 54GB.

After Giving --target-dir=/u02

Still the sqoop is importing the datanode /u01/dfs. I can all see that it /u01 get 100% full.

After that i see the error.

 

Source Oracle Tables contains Blob Objects.

 

Please suggest

Announcements
New solutions