Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Dynamic partition with sqoop ?

avatar
New Contributor

I read from sqoop documentation

http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_hcatalog_background

23.6. Support for Partitioning

The Sqoop HCatalog feature supports the following table types:

  • Unpartitioned tables
  • Partitioned tables with a static partitioning key specified
  • Partitioned tables with dynamic partition keys from the database result set
  • Partitioned tables with a combination of a static key and additional dynamic partitioning keys

But I am not getting how to use dynamic partition using sqoop.

sqoop import --connect jdbc:mysql://localhost:3306/test
--username root
--password-file file:///home/hdfs/.password
--split-by id
--table test 
--delete-target-dir
--fields-terminated-by ","
--hcatalog-table test
--hcatalog-storage-stanza "stored as orcfile"
--create-hcatalog-table
--hcatalog-partition-keys created_at
--map-column-hive created_at=date,updated_at=date

Getting This error


Either both --hcatalog-partition-keys and --hcatalog-partition-values should be provided or both of these options should be omitted.
and if I use the partition value then it assigns it as static partition

sqoop import --connect jdbc:mysql://localhost:3306/test
--username root
--password-file file:///home/hdfs/.password
--split-by id
--table test 
--delete-target-dir
--fields-terminated-by ","
--hcatalog-table test
--hcatalog-storage-stanza "stored as orcfile"
--create-hcatalog-table
--hcatalog-partition-keys created_at
--hcatalog-partition-value created_at
--map-column-hive created_at=date,updated_at=date

after giving created_at as hcatalog-partition-value it created the directory like apps/hive/warehouse/test/created_at=created_at

it should have used created_at column to create dynamic partition based on data.

Please help me fix this issue

2 REPLIES 2

avatar
Expert Contributor

Hi,

 

You can try like below example.

 

Data in source

###########

 

[root@host-10-17-103-77 ~]# sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table test_partition(id int,name varchar(30),par varchar(20))"

[root@host-10-17-103-77 ~]# sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into test_partition values(1,'nitish','par1')"

[root@host-10-17-103-77 ~]# sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into test_partition values(2,'mohit','par2')"

[root@host-10-17-103-77 ~]# sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into test_partition values(3,'mohit123','par3')"

 

Table in Hive:-

############

0: jdbc:hive2://host-10-17-103-79.coe.clouder> create external table test_sqoop_par(id int,name string) partitioned by (par string) row format delimited fields terminated by '\t' location '/user/systest/test_sqoop_par';

 

NOTE:- partition column as string is only supported.

 

Sqoop command:-

##############

[root@host-10-17-103-77 ~]# sqoop import --connect $MYCONN --username $MYUSER --password $MYPSWD --hcatalog-database default --hcatalog-table test_sqoop_par --table TEST_PARTITION -m1 --hive-partition-value par

 

Data in Hive:-

##########

0: jdbc:hive2://host-10-17-103-79.coe.clouder> show partitions test_sqoop_par;

+------------+

| partition  |

+------------+

| par=par1   |

| par=par2   |

| par=par3   |

+------------+

 

[root@host-10-17-103-77 ~]# hadoop fs -ls /user/systest/test_sqoop_par

Found 3 items

drwxr-xr-x   - systest supergroup          0 2019-09-30 18:49 /user/systest/test_sqoop_par/par=par1

drwxr-xr-x   - systest supergroup          0 2019-09-30 18:49 /user/systest/test_sqoop_par/par=par2

drwxr-xr-x   - systest supergroup          0 2019-09-30 18:49 /user/systest/test_sqoop_par/par=par3

 

Hope above helps.

 

Regards

Nitish

 

avatar
New Contributor

Hi to all

 

I'm looking for a solution to import and to partition dynamically data into ORC hive tables.

 

I've seent that with sqoop it is possibile di import and to partition data dynamically by the value of columns, but I've also seen that this feature is working only for the first run, because in the following job launchs I get an  'org.apache.hive.hcatalog.common.HCatException : 2002 : Partition already present with given partition key values' message since sqoop is aptempting toappend data to existing partitions.

 

Any idea to resolve this issue that make impossibile to use dynamic partioning in sqoop?