Created on 03-28-2017 09:49 AM - edited 09-16-2022 04:21 AM
Im working on a sqoop import using Shell script:
#!/bin/bash
DATE= $(date +"%d-%m-%y")
while IFS=":" read -r server dbname tablename; do
sqoop import --connect jdbc:mysql://$server/$dbname --table $tablename --username root --password cloudera --hive-import --hive-table dynpart --hive-partition-key 'thisday' --hive-partition-value $DATE --target-dir '/user/hive/newimp5' -m 1
done<tables.txt
tables.txt:
127.0.0.1:mydb:dpart
My condition is to do the partition for everyday. Hive table:
create table dynpart(id int, name char(30), city char(30)) partitioned by(thisday String) row format delimited fields terminated by ',' stored as textfile location '/hive/mytables' tblproperties("comment"="partition column: thisday structure is dd/mm/yyyy");
I have one record in mysql table 'dpart':
1 Sidhartha Noida
The problem I am facing is, the sqoop job runs fine but when I query the hive table it shows:
NULL NULL NULL 28/03/2017
The data from three columns is coming as NULL. But for the partition column, I can see the date as current date.
Can anyone explain what is the mistake that I am doing here ? Any help is appreciated. Thanks.
Created 04-01-2017 10:18 AM
Have you looked at the output of your Sqoop job and at the underlying folder/file structure and is it what you'd expect with the partition being a folder name with a file within that parition folder that holds your other fields?
It seems to me that you are defining a "thisday" column that doesn't exist in the RDBMS table. You are also defining a values for "thisday" column ($DATE). Since that column/key doesn't exist in the table, Sqoop is returning null values for the other fields b/c nothing matches the column/key that you define
If my suspicion is correct, I think you could:
Hope this helps or if it doesn't work, perhaps it will spark some additional thoughts.
Created on 04-03-2017 03:13 AM - edited 04-03-2017 03:15 AM
This is my table structure:
hive> describe dynpart; OK id int name char(30) city char(30) thisday string # Partition Information # col_name data_type comment thisday string
I have the partitioned column 'thisday' if you see the syntax above.
I like the idea of passing the date dynamically as you suggested. But I'd like to implement the idea of dynamic partition. This migh not be the perfect way but I am going to build on this.
Created 04-03-2017 05:24 PM
What I'm trying to get at is that Sqoop is using hive-partition-value that you set to whatever $DATE is, but your table doesn't have that value in it, so Sqoop doesn't actually return anything and so your thisday=${DATE} HDFS folder will contain null values.
You have this one record in your dpart table: 1 Sidhartha Noida. So when Sqoop executes, it doesn't fail because there is no syntactical failure. However, you are filtering out every record that is in your table because you are partitioning on a column that doesn't seem to exist and your value certainly doesn't exist. The result of this, I believe, is no populated values for your Hive table. The only reason you see "thisday" is because you hard-coded that value and it's the partion, so it exists as data via the folder that exists, and is not in the file.
Try getting rid of your hive-partition-* options in your Sqoop command, and temporarily change your target-dir to one of your already-created partition folders and see if you get a record that way... that would help confirm that your Sqoop is working but filtering out data you don't want.
I get that you want to do dynamic Hive partitioning, but it seems like you are trying to do your partitioning through Sqoop and from what I can tell, Sqoop doesn't support dynamic partitioning as you wish (SQOOP-312).