Created on 08-22-2017 08:49 AM - edited 09-16-2022 05:07 AM
Hi,
When I use sqoop command to import and load data from postgresql DB, even if I set the to params to:
--null-string '\\N' \ --null-non-string '\\N' \
but I find that the null values in the HDFS file are replaced by 'N' instead of '\N' :
#sudo -u hdfs hdfs dfs -cat /user/hive/warehouse/db/categories/part-m-00000 | head -3
371,2017-01-10 18:43:25,Cars,N,CA,N 372,2017-01-10 18:43:26,Motos,N,CA,N 373,2017-01-16 11:01:12,Tracks,N,CA,1
Which results to unrecognized NULL values mostly for the STRING type (because for the INT type the 'N' is considered like NULL..).
# select * from categories limit 3;
+-----+----------------------+--------+--------+------+----------+ | id | date | name | status | cc | valid | +-----+----------------------+--------+--------+------+----------+ | 371 | 2017-01-10 18:43:25 | Cars | N | CA | NULL | | 372 | 2017-01-10 18:43:26 | Motos | N | CA | NULL | | 373 | 2017-01-16 11:01:12 | Tracks | N | CA | 1 | +-----+----------------------+--------+--------+------+----------+
#select count(*) from categories where status is null;
Error converting column: 5 to INT
Error parsing row: file: hdfs://XXX:8020/user/hive/warehouse/...., before offset: 18698
Error parsing row: file: hdfs://XXX:8020/user/hive/warehouse/...., before offset: 18698
+----------+
| count(*) |
+----------+
| 0 |
+----------+
Fetched 1 row(s) in 0.19s
But when I tested to import and load manually with a direct hdfs put a categories.csv file that have a '\N' instead of NULL value I don't face any problem.
#sudo -u hdfs hdfs dfs -put categories.csv /user/hive/warehouse/db
#sudo -u hdfs hdfs dfs -cat /user/hive/warehouse/db/categories/categories.csv | head -3
371,2017-01-10 18:43:25,Cars,\N,CA,\N
372,2017-01-10 18:43:26,Motos,\N,CA,\N
373,2017-01-16 11:01:12,Tracks,\N,CA,1
# select * from categories limit 3;
+-----+----------------------+--------+--------+------+----------+
| id | date | name | status | cc | valid |
+-----+----------------------+--------+--------+------+----------+
| 371 | 2017-01-10 18:43:25 | Cars | NULL | CA | NULL |
| 372 | 2017-01-10 18:43:26 | Motos | NULL | CA | NULL |
| 373 | 2017-01-16 11:01:12 | Tracks | NULL | CA | 1 |
+-----+----------------------+--------+--------+------+----------+
#select count(*) from categories where status is null;
+----------+
| count(*) |
+----------+
| 3 |
+----------+
Fetched 1 row(s) in 0.20s
Thanks for your help.
Created on 08-28-2017 03:23 AM - edited 08-28-2017 03:24 AM
Hi
I found the solution by adding two antislashs before the old two.
--null-string '\\\\N' --null-non-string '\\\\N'
Thanks @wsch for your replay.
Created 08-24-2017 05:13 AM
Looks as though sqoop isn't evaluating the argument properly. I've had similar things happen to me with wonky shell evaluation; which shell are you using?
Have you tried throwing a few more backslashes in there for good measure?
Created on 08-28-2017 03:23 AM - edited 08-28-2017 03:24 AM
Hi
I found the solution by adding two antislashs before the old two.
--null-string '\\\\N' --null-non-string '\\\\N'
Thanks @wsch for your replay.