Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Sqoop import - null values in HDFS files replaced by 'N' instead of '\N'

avatar
Master Collaborator

Hi,

When I use sqoop command to import and load data from postgresql DB, even if I set the to params to:

 

--null-string '\\N' \
--null-non-string '\\N' \

but I find that the null values in the HDFS file are replaced by 'N' instead of '\N' :

#sudo -u hdfs hdfs dfs -cat /user/hive/warehouse/db/categories/part-m-00000 | head -3
371,2017-01-10 18:43:25,Cars,N,CA,N 372,2017-01-10 18:43:26,Motos,N,CA,N 373,2017-01-16 11:01:12,Tracks,N,CA,1

Which results to unrecognized NULL values mostly for the STRING type (because for the INT type the 'N' is considered like NULL..).

# select * from categories limit 3;
+-----+----------------------+--------+--------+------+----------+ | id | date | name | status | cc | valid | +-----+----------------------+--------+--------+------+----------+ | 371 | 2017-01-10 18:43:25 | Cars | N | CA | NULL | | 372 | 2017-01-10 18:43:26 | Motos | N | CA | NULL | | 373 | 2017-01-16 11:01:12 | Tracks | N | CA | 1 | +-----+----------------------+--------+--------+------+----------+
#select count(*) from categories where status is null;
Error converting column: 5 to INT
Error parsing row: file: hdfs://XXX:8020/user/hive/warehouse/...., before offset: 18698
Error parsing row: file: hdfs://XXX:8020/user/hive/warehouse/...., before offset: 18698
+----------+
| count(*) |
+----------+
| 0        |
+----------+
Fetched 1 row(s) in 0.19s


But when I tested to import and load manually with a direct hdfs put a categories.csv file that have a '\N' instead of NULL value I don't face any problem.

#sudo -u hdfs hdfs dfs -put categories.csv /user/hive/warehouse/db
#sudo -u hdfs hdfs dfs -cat /user/hive/warehouse/db/categories/categories.csv | head -3
371,2017-01-10 18:43:25,Cars,\N,CA,\N
372,2017-01-10 18:43:26,Motos,\N,CA,\N
373,2017-01-16 11:01:12,Tracks,\N,CA,1
# select * from categories limit 3;
+-----+----------------------+--------+--------+------+----------+
| id  | date                 | name   | status | cc   | valid    |
+-----+----------------------+--------+--------+------+----------+
| 371 | 2017-01-10 18:43:25  | Cars   | NULL   | CA   | NULL     |
| 372 | 2017-01-10 18:43:26  | Motos  | NULL   | CA   | NULL     |
| 373 | 2017-01-16 11:01:12  | Tracks | NULL   | CA   | 1        |
+-----+----------------------+--------+--------+------+----------+
#select count(*) from categories where status is null;
+----------+
| count(*) |
+----------+
| 3        |
+----------+
Fetched 1 row(s) in 0.20s

 

 

Thanks for your help.








 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hi

I found the solution by adding two antislashs before the old two.

--null-string '\\\\N'
--null-non-string '\\\\N'


Thanks @wsch for your replay.

View solution in original post

2 REPLIES 2

avatar
Frequent Visitor

Looks as though sqoop isn't evaluating the argument properly. I've had similar things happen to me with wonky shell evaluation; which shell are you using?

 

Have you tried throwing a few more backslashes in there for good measure?

avatar
Master Collaborator

Hi

I found the solution by adding two antislashs before the old two.

--null-string '\\\\N'
--null-non-string '\\\\N'


Thanks @wsch for your replay.