Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)

Sqoop can be used to bring data from RDBMS, but a limitation of sqoop is that data in HDFS is stored in one folder. If a partitioned table needs to be created in Hive for further queries, users need to create Hive script to distribute data to appropriate partitions. There is no direct option of creating partition tables based in Hive directly from sqoop.

However, we can use sqoop features of putting output in a specific directory to simulate a partitioned table structure in HDFS. Since any partitioned table has a HDFS structure where each partition is <table name>/<partition column name=value> , we can use following sqoop structure to select appropriate data for each partition and move it to correct HDFS structure.

sqoop --table <table1> --where <where clause for pt=0> --target-dir /home/user1/table1/pt=0 
sqoop --table <table1> --where <where clause for pt=1> --target-dir /home/user1/table1/pt=1

Now, an external HIVE table can be created that is pointing to /home/user1/table1 directory with partition column as pt.

CREATE EXTERNAL TABLE <hive_table_name>
--Column definitions---
PARTITIONED BY (pt string)
LOCATION '/home/user1/table1'

This approach allows us to get data to HDFS in a structure that is appropriate for creating partitioned HIVE table.

Some advantages with this approach are

  • Independent of source table partition structure. Source table may not even be partitioned.
  • Can be extended to use cases where Hive partitioning is based on multiple columns.
  • Hive table partitioning scheme can be different from source table partitioning scheme.

Multiple sqoop required to and Hive table creation script can be combined in one script to allow creation of any Hive partitioned table from RDBMS.

6,540 Views
0 Kudos
Comments
Not applicable

Hi,

Thanks for the info..

It seems I am unable to load the data after implemented the partitioning like suggested

I have to extract the same table data from different servers and databases.

So I have made the partition like servername.databasename.tablename(this is my sqoop target dir.)

In the hive external table I have added the path in the partition column and se the location.

Regards

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎02-17-2016 03:39 AM
Updated by:
 
Contributors
Top Kudoed Authors