Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop import - null values in HDFS files replaced by 'N' instead of '\N'

avatar
Master Collaborator

Hi,

When I use sqoop command to import and load data from postgresql DB, even if I set the to params to:

 

--null-string '\\N' \
--null-non-string '\\N' \

but I find that the null values in the HDFS file are replaced by 'N' instead of '\N' :

#sudo -u hdfs hdfs dfs -cat /user/hive/warehouse/db/categories/part-m-00000 | head -3
371,2017-01-10 18:43:25,Cars,N,CA,N 372,2017-01-10 18:43:26,Motos,N,CA,N 373,2017-01-16 11:01:12,Tracks,N,CA,1

Which results to unrecognized NULL values mostly for the STRING type (because for the INT type the 'N' is considered like NULL..).

# select * from categories limit 3;
+-----+----------------------+--------+--------+------+----------+ | id | date | name | status | cc | valid | +-----+----------------------+--------+--------+------+----------+ | 371 | 2017-01-10 18:43:25 | Cars | N | CA | NULL | | 372 | 2017-01-10 18:43:26 | Motos | N | CA | NULL | | 373 | 2017-01-16 11:01:12 | Tracks | N | CA | 1 | +-----+----------------------+--------+--------+------+----------+
#select count(*) from categories where status is null;
Error converting column: 5 to INT
Error parsing row: file: hdfs://XXX:8020/user/hive/warehouse/...., before offset: 18698
Error parsing row: file: hdfs://XXX:8020/user/hive/warehouse/...., before offset: 18698
+----------+
| count(*) |
+----------+
| 0        |
+----------+
Fetched 1 row(s) in 0.19s


But when I tested to import and load manually with a direct hdfs put a categories.csv file that have a '\N' instead of NULL value I don't face any problem.

#sudo -u hdfs hdfs dfs -put categories.csv /user/hive/warehouse/db
#sudo -u hdfs hdfs dfs -cat /user/hive/warehouse/db/categories/categories.csv | head -3
371,2017-01-10 18:43:25,Cars,\N,CA,\N
372,2017-01-10 18:43:26,Motos,\N,CA,\N
373,2017-01-16 11:01:12,Tracks,\N,CA,1
# select * from categories limit 3;
+-----+----------------------+--------+--------+------+----------+
| id  | date                 | name   | status | cc   | valid    |
+-----+----------------------+--------+--------+------+----------+
| 371 | 2017-01-10 18:43:25  | Cars   | NULL   | CA   | NULL     |
| 372 | 2017-01-10 18:43:26  | Motos  | NULL   | CA   | NULL     |
| 373 | 2017-01-16 11:01:12  | Tracks | NULL   | CA   | 1        |
+-----+----------------------+--------+--------+------+----------+
#select count(*) from categories where status is null;
+----------+
| count(*) |
+----------+
| 3        |
+----------+
Fetched 1 row(s) in 0.20s

 

 

Thanks for your help.








 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hi

I found the solution by adding two antislashs before the old two.

--null-string '\\\\N'
--null-non-string '\\\\N'


Thanks @wsch for your replay.

View solution in original post

2 REPLIES 2

avatar
Explorer

Looks as though sqoop isn't evaluating the argument properly. I've had similar things happen to me with wonky shell evaluation; which shell are you using?

 

Have you tried throwing a few more backslashes in there for good measure?

avatar
Master Collaborator

Hi

I found the solution by adding two antislashs before the old two.

--null-string '\\\\N'
--null-non-string '\\\\N'


Thanks @wsch for your replay.