- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Dynamic partition with sqoop ?
- Labels:
-
Apache HCatalog
-
Apache Hive
-
Apache Sqoop
Created ‎01-17-2018 07:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I read from sqoop documentation
http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_hcatalog_background
23.6. Support for Partitioning
The Sqoop HCatalog feature supports the following table types:
- Unpartitioned tables
- Partitioned tables with a static partitioning key specified
- Partitioned tables with dynamic partition keys from the database result set
- Partitioned tables with a combination of a static key and additional dynamic partitioning keys
But I am not getting how to use dynamic partition using sqoop.
sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password-file file:///home/hdfs/.password --split-by id --table test --delete-target-dir --fields-terminated-by "," --hcatalog-table test --hcatalog-storage-stanza "stored as orcfile" --create-hcatalog-table --hcatalog-partition-keys created_at --map-column-hive created_at=date,updated_at=date
Getting This error
Either both --hcatalog-partition-keys and --hcatalog-partition-values should be provided or both of these options should be omitted.
and if I use the partition value then it assigns it as static partition
sqoop import --connect jdbc:mysql://localhost:3306/test --username root --password-file file:///home/hdfs/.password --split-by id --table test --delete-target-dir --fields-terminated-by "," --hcatalog-table test --hcatalog-storage-stanza "stored as orcfile" --create-hcatalog-table --hcatalog-partition-keys created_at --hcatalog-partition-value created_at --map-column-hive created_at=date,updated_at=date
after giving created_at as hcatalog-partition-value it created the directory like apps/hive/warehouse/test/created_at=created_at
it should have used created_at column to create dynamic partition based on data.
Please help me fix this issue
Created ‎09-30-2019 07:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You can try like below example.
Data in source
###########
[root@host-10-17-103-77 ~]# sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table test_partition(id int,name varchar(30),par varchar(20))"
[root@host-10-17-103-77 ~]# sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into test_partition values(1,'nitish','par1')"
[root@host-10-17-103-77 ~]# sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into test_partition values(2,'mohit','par2')"
[root@host-10-17-103-77 ~]# sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into test_partition values(3,'mohit123','par3')"
Table in Hive:-
############
0: jdbc:hive2://host-10-17-103-79.coe.clouder> create external table test_sqoop_par(id int,name string) partitioned by (par string) row format delimited fields terminated by '\t' location '/user/systest/test_sqoop_par';
NOTE:- partition column as string is only supported.
Sqoop command:-
##############
[root@host-10-17-103-77 ~]# sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --hcatalog-database default --hcatalog-table test_sqoop_par --table TEST_PARTITION -m1 --hive-partition-value par
Data in Hive:-
##########
0: jdbc:hive2://host-10-17-103-79.coe.clouder> show partitions test_sqoop_par;
+------------+
| partition |
+------------+
| par=par1 |
| par=par2 |
| par=par3 |
+------------+
[root@host-10-17-103-77 ~]# hadoop fs -ls /user/systest/test_sqoop_par
Found 3 items
drwxr-xr-x - systest supergroup 0 2019-09-30 18:49 /user/systest/test_sqoop_par/par=par1
drwxr-xr-x - systest supergroup 0 2019-09-30 18:49 /user/systest/test_sqoop_par/par=par2
drwxr-xr-x - systest supergroup 0 2019-09-30 18:49 /user/systest/test_sqoop_par/par=par3
Hope above helps.
Regards
Nitish
Created ‎06-12-2020 10:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi to all
I'm looking for a solution to import and to partition dynamically data into ORC hive tables.
I've seent that with sqoop it is possibile di import and to partition data dynamically by the value of columns, but I've also seen that this feature is working only for the first run, because in the following job launchs I get an 'org.apache.hive.hcatalog.common.HCatException : 2002 : Partition already present with given partition key values' message since sqoop is aptempting toappend data to existing partitions.
Any idea to resolve this issue that make impossibile to use dynamic partioning in sqoop?
