Community Articles

Find and share helpful community-sourced technical articles.
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)

Sqoop can be used to bring data from RDBMS, but a limitation of sqoop is that data in HDFS is stored in one folder. If a partitioned table needs to be created in Hive for further queries, users need to create Hive script to distribute data to appropriate partitions. There is no direct option of creating partition tables based in Hive directly from sqoop.

However, we can use sqoop features of putting output in a specific directory to simulate a partitioned table structure in HDFS. Since any partitioned table has a HDFS structure where each partition is <table name>/<partition column name=value> , we can use following sqoop structure to select appropriate data for each partition and move it to correct HDFS structure.

sqoop --table <table1> --where <where clause for pt=0> --target-dir /home/user1/table1/pt=0 
sqoop --table <table1> --where <where clause for pt=1> --target-dir /home/user1/table1/pt=1

Now, an external HIVE table can be created that is pointing to /home/user1/table1 directory with partition column as pt.

CREATE EXTERNAL TABLE <hive_table_name>
--Column definitions---
PARTITIONED BY (pt string)
LOCATION '/home/user1/table1'

This approach allows us to get data to HDFS in a structure that is appropriate for creating partitioned HIVE table.

Some advantages with this approach are

  • Independent of source table partition structure. Source table may not even be partitioned.
  • Can be extended to use cases where Hive partitioning is based on multiple columns.
  • Hive table partitioning scheme can be different from source table partitioning scheme.

Multiple sqoop required to and Hive table creation script can be combined in one script to allow creation of any Hive partitioned table from RDBMS.

0 Kudos
New Contributor


Thanks for the info..

It seems I am unable to load the data after implemented the partitioning like suggested

I have to extract the same table data from different servers and databases.

So I have made the partition like servername.databasename.tablename(this is my sqoop target dir.)

In the hive external table I have added the path in the partition column and se the location.