Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Sqoop import - null values in HDFS files replaced by 'N' instead of '\N'

Solved Go to solution
Highlighted

Sqoop import - null values in HDFS files replaced by 'N' instead of '\N'

Expert Contributor

Hi,

When I use sqoop command to import and load data from postgresql DB, even if I set the to params to:

 

--null-string '\\N' \
--null-non-string '\\N' \

but I find that the null values in the HDFS file are replaced by 'N' instead of '\N' :

#sudo -u hdfs hdfs dfs -cat /user/hive/warehouse/db/categories/part-m-00000 | head -3
371,2017-01-10 18:43:25,Cars,N,CA,N 372,2017-01-10 18:43:26,Motos,N,CA,N 373,2017-01-16 11:01:12,Tracks,N,CA,1

Which results to unrecognized NULL values mostly for the STRING type (because for the INT type the 'N' is considered like NULL..).

# select * from categories limit 3;
+-----+----------------------+--------+--------+------+----------+ | id | date | name | status | cc | valid | +-----+----------------------+--------+--------+------+----------+ | 371 | 2017-01-10 18:43:25 | Cars | N | CA | NULL | | 372 | 2017-01-10 18:43:26 | Motos | N | CA | NULL | | 373 | 2017-01-16 11:01:12 | Tracks | N | CA | 1 | +-----+----------------------+--------+--------+------+----------+
#select count(*) from categories where status is null;
Error converting column: 5 to INT
Error parsing row: file: hdfs://XXX:8020/user/hive/warehouse/...., before offset: 18698
Error parsing row: file: hdfs://XXX:8020/user/hive/warehouse/...., before offset: 18698
+----------+
| count(*) |
+----------+
| 0        |
+----------+
Fetched 1 row(s) in 0.19s


But when I tested to import and load manually with a direct hdfs put a categories.csv file that have a '\N' instead of NULL value I don't face any problem.

#sudo -u hdfs hdfs dfs -put categories.csv /user/hive/warehouse/db
#sudo -u hdfs hdfs dfs -cat /user/hive/warehouse/db/categories/categories.csv | head -3
371,2017-01-10 18:43:25,Cars,\N,CA,\N
372,2017-01-10 18:43:26,Motos,\N,CA,\N
373,2017-01-16 11:01:12,Tracks,\N,CA,1
# select * from categories limit 3;
+-----+----------------------+--------+--------+------+----------+
| id  | date                 | name   | status | cc   | valid    |
+-----+----------------------+--------+--------+------+----------+
| 371 | 2017-01-10 18:43:25  | Cars   | NULL   | CA   | NULL     |
| 372 | 2017-01-10 18:43:26  | Motos  | NULL   | CA   | NULL     |
| 373 | 2017-01-16 11:01:12  | Tracks | NULL   | CA   | 1        |
+-----+----------------------+--------+--------+------+----------+
#select count(*) from categories where status is null;
+----------+
| count(*) |
+----------+
| 3        |
+----------+
Fetched 1 row(s) in 0.20s

 

 

Thanks for your help.








 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Sqoop import - null values in HDFS files replaced by 'N' instead of '\N'

Expert Contributor

Hi

I found the solution by adding two antislashs before the old two.

--null-string '\\\\N'
--null-non-string '\\\\N'


Thanks @wsch for your replay.

2 REPLIES 2

Re: Sqoop import - null values in HDFS files replaced by 'N' instead of '\N'

New Contributor

Looks as though sqoop isn't evaluating the argument properly. I've had similar things happen to me with wonky shell evaluation; which shell are you using?

 

Have you tried throwing a few more backslashes in there for good measure?

Re: Sqoop import - null values in HDFS files replaced by 'N' instead of '\N'

Expert Contributor

Hi

I found the solution by adding two antislashs before the old two.

--null-string '\\\\N'
--null-non-string '\\\\N'


Thanks @wsch for your replay.

Don't have an account?
Coming from Hortonworks? Activate your account here