Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to pass date into shell script for a sqoop command dynamically?

How to pass date into shell script for a sqoop command dynamically?

Contributor

Im working on a sqoop import using Shell script:

 

#!/bin/bash
DATE= $(date +"%d-%m-%y")
while IFS=":" read -r server dbname tablename; do
sqoop import --connect jdbc:mysql://$server/$dbname --table $tablename --username root --password cloudera --hive-import --hive-table dynpart --hive-partition-key 'thisday' --hive-partition-value $DATE --target-dir '/user/hive/newimp5' -m 1
done<tables.txt

 

tables.txt:

127.0.0.1:mydb:dpart

 

My condition is to do the partition for everyday. Hive table:

 

create table dynpart(id int, name char(30), city char(30))  partitioned by(thisday String)  row format delimited
  fields terminated by ','  stored as textfile
  location '/hive/mytables'  tblproperties("comment"="partition column: thisday structure is dd/mm/yyyy");

I have one record in mysql table 'dpart':
1 Sidhartha Noida

 

The problem I am facing is, the sqoop job runs fine but when I query the hive table it shows:

NULL NULL NULL 28/03/2017

The data from three columns is coming as NULL. But for the partition column, I can see the date as current date.

Can anyone explain what is the mistake that I am doing here ? Any help is appreciated. Thanks.

3 REPLIES 3

Re: How to pass date into shell script for a sqoop command dynamically?

Rising Star

Have you looked at the output of your Sqoop job and at the underlying folder/file structure and is it what you'd expect with the partition being a folder name with a file within that parition folder that holds your other fields?

 

It seems to me that you are defining a "thisday" column that doesn't exist in the RDBMS table.  You are also defining a values for "thisday" column ($DATE).  Since that column/key doesn't exist in the table, Sqoop is returning null values for the other fields b/c nothing matches the column/key that you define

 

If my suspicion is correct, I think you could:

  • Define a dynamic target directory where you use your $DATE variable as the far-right directory (e.g. --target-dir '/user/hive/newimp5/${DATE}').  If you have Hive as an external table and partitioned, I think something like this would be fine, though I can't vouch for the exact syntax I just put.

 

Hope this helps or if it doesn't work, perhaps it will spark some additional thoughts.

Re: How to pass date into shell script for a sqoop command dynamically?

Contributor

This is my table structure:

 

hive> describe dynpart;
OK
id int
name char(30)
city char(30)
thisday string
# Partition Information
# col_name data_type comment
thisday string

I have the partitioned column 'thisday' if you see the syntax above.

I like the idea of passing the date dynamically as you suggested. But I'd like to implement the idea of dynamic partition. This migh not be the perfect way but I am going to build on this. 

Re: How to pass date into shell script for a sqoop command dynamically?

Rising Star

What I'm trying to get at is that Sqoop is using hive-partition-value that you set to whatever $DATE is, but your table doesn't have that value in it, so Sqoop doesn't actually return anything and so your thisday=${DATE} HDFS folder will contain null values.

 

You have this one record in your dpart table: 1 Sidhartha Noida.  So when Sqoop executes, it doesn't fail because there is no syntactical failure. However, you are filtering out every record that is in your table because you are partitioning on a column that doesn't seem to exist and your value certainly doesn't exist.  The result of this, I believe, is no populated values for your Hive table.  The only reason you see "thisday" is because you hard-coded that value and it's the partion, so it exists as data via the folder that exists, and is not in the file.

 

Try getting rid of your hive-partition-* options in your Sqoop command, and temporarily change your target-dir to one of your already-created partition folders and see if you get a record that way... that would help confirm that your Sqoop is working but filtering out data you don't want.

 

I get that you want to do dynamic Hive partitioning, but it seems like you are trying to do your partitioning through Sqoop and from what I can tell, Sqoop doesn't support dynamic partitioning as you wish (SQOOP-312).