Member since
03-21-2018
3
Posts
0
Kudos Received
0
Solutions
05-03-2018
12:58 AM
Thanks for your reply, Eric If I understand correctly: --fields-terminated-by is the option used during Sqoop Import (ie. they are the Output Formatting arguments) which describe how the data will be written to HDFS. --input-fields-terminated-by is the option used during Sqoop Export (ie. they are Input Formatting arguments) which describe how the input data is present in HDFS before exporting to RDBMS. And also as I mentioned before, without --direct option sqoop command has expected behavior If I'm wrong correct me, please
... View more
03-21-2018
04:33 AM
Hi Cloudera team. I'm currently using Sqoop 1.4.6-cdh5.12.0. I've found strange behavior using sqoop import/export with --direct option. While trying to import table into hdfs using below sqoop import command, it works as expected:
sqoop import \
--connect jdbc:mysql://quickstart.cloudera/retail_db \
--username retail_dba \
--password cloudera \
--table categories \
--target-dir export_problem/categories \
--fields-terminated-by '\t' \
--direct
Result:
[cloudera@quickstart ~]$ hadoop fs -cat export_problem/categories/p*
1 2 Football
2 2 Soccer
3 2 Baseball & Softball
4 2 Basketball
...
But when I try to export it back to mysql with the same --direct option it doesn't work as expected:
mysql> create table categories_export as select * from categories limit 0;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
sqoop export \
--connect jdbc:mysql://quickstart.cloudera/retail_db \
--username retail_dba \
--password cloudera \
--export-dir export_problem/categories \
--table categories_export \
--input-fields-terminated-by "\t" \
--direct
Result:
mysql> select * from categories_export limit 5;
+-------------+------------------------+---------------+
| category_id | category_department_id | category_name |
+-------------+------------------------+---------------+
| 1 | NULL | |
| 2 | NULL | |
| 3 | NULL | |
| 4 | NULL | |
| 5 | NULL | |
+-------------+------------------------+---------------+
5 rows in set (0.00 sec)
But without --direct option all works as expected:
sqoop export \
--connect jdbc:mysql://quickstart.cloudera/retail_db \
--username retail_dba \
--password cloudera \
--export-dir export_problem/categories \
--table categories_export \
--input-fields-terminated-by "\t"
Result:
mysql> select * from categories_export limit 5;
+-------------+------------------------+---------------------+
| category_id | category_department_id | category_name |
+-------------+------------------------+---------------------+
| 1 | 2 | Football |
| 2 | 2 | Soccer |
| 3 | 2 | Baseball & Softball |
| 4 | 2 | Basketball |
| 5 | 2 | Lacrosse |
+-------------+------------------------+---------------------+
5 rows in set (0.00 sec)
Looks like in sqoop export --direct overwrites previous options but in sqoop import not. Is it an expected behavior?
... View more
Labels:
- Labels:
-
Apache Sqoop
-
HDFS
-
Quickstart VM