Created on 02-17-2016 03:39 AM
Sqoop can be used to bring data from RDBMS, but a limitation of sqoop is that data in HDFS is stored in one folder. If a partitioned table needs to be created in Hive for further queries, users need to create Hive script to distribute data to appropriate partitions. There is no direct option of creating partition tables based in Hive directly from sqoop.
However, we can use sqoop features of putting output in a specific directory to simulate a partitioned table structure in HDFS. Since any partitioned table has a HDFS structure where each partition is <table name>/<partition column name=value> , we can use following sqoop structure to select appropriate data for each partition and move it to correct HDFS structure.
sqoop --table <table1> --where <where clause for pt=0> --target-dir /home/user1/table1/pt=0 sqoop --table <table1> --where <where clause for pt=1> --target-dir /home/user1/table1/pt=1
Now, an external HIVE table can be created that is pointing to /home/user1/table1 directory with partition column as pt.
CREATE EXTERNAL TABLE <hive_table_name> --Column definitions--- PARTITIONED BY (pt string) LOCATION '/home/user1/table1'
This approach allows us to get data to HDFS in a structure that is appropriate for creating partitioned HIVE table.
Some advantages with this approach are
Multiple sqoop required to and Hive table creation script can be combined in one script to allow creation of any Hive partitioned table from RDBMS.
Created on 06-22-2016 01:05 AM
Hi,
Thanks for the info..
It seems I am unable to load the data after implemented the partitioning like suggested
I have to extract the same table data from different servers and databases.
So I have made the partition like servername.databasename.tablename(this is my sqoop target dir.)
In the hive external table I have added the path in the partition column and se the location.
Regards
User | Count |
---|---|
758 | |
379 | |
316 | |
309 | |
268 |