Support Questions

AcharkiMed · ‎08-22-2017

Hi,

When I use sqoop command to import and load data from postgresql DB, even if I set the to params to:

--null-string '\\N' \
--null-non-string '\\N' \

but I find that the null values in the HDFS file are replaced by 'N' instead of '\N' :

#sudo -u hdfs hdfs dfs -cat /user/hive/warehouse/db/categories/part-m-00000 | head -3
371,2017-01-10 18:43:25,Cars,N,CA,N
372,2017-01-10 18:43:26,Motos,N,CA,N
373,2017-01-16 11:01:12,Tracks,N,CA,1

Which results to unrecognized NULL values mostly for the STRING type (because for the INT type the 'N' is considered like NULL..).

# select * from categories limit 3;
+-----+----------------------+--------+--------+------+----------+
| id  | date                 | name   | status | cc   | valid    |
+-----+----------------------+--------+--------+------+----------+
| 371 | 2017-01-10 18:43:25  | Cars   | N      | CA   | NULL     |
| 372 | 2017-01-10 18:43:26  | Motos  | N      | CA   | NULL     |
| 373 | 2017-01-16 11:01:12  | Tracks | N      | CA   | 1        |
+-----+----------------------+--------+--------+------+----------+

#select count(*) from categories where status is null;
Error converting column: 5 to INT
Error parsing row: file: hdfs://XXX:8020/user/hive/warehouse/...., before offset: 18698
Error parsing row: file: hdfs://XXX:8020/user/hive/warehouse/...., before offset: 18698
+----------+
| count(*) |
+----------+
| 0        |
+----------+
Fetched 1 row(s) in 0.19s

But when I tested to import and load manually with a direct hdfs put a categories.csv file that have a '\N' instead of NULL value I don't face any problem.

#sudo -u hdfs hdfs dfs -put categories.csv /user/hive/warehouse/db
#sudo -u hdfs hdfs dfs -cat /user/hive/warehouse/db/categories/categories.csv | head -3
371,2017-01-10 18:43:25,Cars,\N,CA,\N
372,2017-01-10 18:43:26,Motos,\N,CA,\N
373,2017-01-16 11:01:12,Tracks,\N,CA,1
# select * from categories limit 3;
+-----+----------------------+--------+--------+------+----------+
| id  | date                 | name   | status | cc   | valid    |
+-----+----------------------+--------+--------+------+----------+
| 371 | 2017-01-10 18:43:25  | Cars   | NULL   | CA   | NULL     |
| 372 | 2017-01-10 18:43:26  | Motos  | NULL   | CA   | NULL     |
| 373 | 2017-01-16 11:01:12  | Tracks | NULL   | CA   | 1        |
+-----+----------------------+--------+--------+------+----------+
#select count(*) from categories where status is null;
+----------+
| count(*) |
+----------+
| 3        |
+----------+
Fetched 1 row(s) in 0.20s

Thanks for your help.

AcharkiMed · ‎08-28-2017

Hi

I found the solution by adding two antislashs before the old two.

--null-string '\\\\N'
--null-non-string '\\\\N'

Thanks @wsch for your replay.

View solution in original post

wsch · ‎08-24-2017

Looks as though sqoop isn't evaluating the argument properly. I've had similar things happen to me with wonky shell evaluation; which shell are you using?

Have you tried throwing a few more backslashes in there for good measure?

AcharkiMed · ‎08-28-2017

Hi

I found the solution by adding two antislashs before the old two.

--null-string '\\\\N'
--null-non-string '\\\\N'

Thanks @wsch for your replay.

Cloudera Community

Support Questions

Sqoop import - null values in HDFS files replaced by 'N' instead of '\N'