Created on 07-11-2016 06:59 AM - edited 09-16-2022 03:29 AM
Can I import data in JSON format from Oracle tables using Sqoop ?
I am using below version,
Sqoop 1.4.5-cdh5.3.3
Thanks,
Vinod
Created 07-14-2016 06:57 PM
Sqoop doesn't directly support JSON file format. The only way I can think of importing data as JSON to utilise the hcatalog import function. This can be done in below formats:
1) Create a table in hcatalog with JSON serde. Below is just an example:
hcat -e "create table json_test (id int, value string) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'"
2) Use Sqoop to import data into that table:
sqoop import\
--connect 'jdbc:mysql://nightly57-unsecure-1.gce.cloudera.com:3306/sqoop1'\
--username sqoop1 --password cloudera --table test -m 1\
--hcatalog-database default --hcatalog-table json_test
3) Check the imported data:
[root@nightly57-unsecure-1 ~]# hadoop fs -ls /user/hive/warehouse/json_test Found 1 items -rw-rw-rw- 3 root hive 46 2016-07-14 18:35 /user/hive/warehouse/json_test/part-m-00000 [root@nightly57-unsecure-1 ~]# hadoop fs -cat /user/hive/warehouse/json_test/part-m-00000 {"id":1,"value":"abcd"} {"id":2,"value":"cd"}
The downside of this method is that you will have to create a Hive/HCatalog table before the import. You will have to clean it if it is not need in future.
Please let me know if you have any further question.
Created 07-14-2016 06:57 PM
Sqoop doesn't directly support JSON file format. The only way I can think of importing data as JSON to utilise the hcatalog import function. This can be done in below formats:
1) Create a table in hcatalog with JSON serde. Below is just an example:
hcat -e "create table json_test (id int, value string) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'"
2) Use Sqoop to import data into that table:
sqoop import\
--connect 'jdbc:mysql://nightly57-unsecure-1.gce.cloudera.com:3306/sqoop1'\
--username sqoop1 --password cloudera --table test -m 1\
--hcatalog-database default --hcatalog-table json_test
3) Check the imported data:
[root@nightly57-unsecure-1 ~]# hadoop fs -ls /user/hive/warehouse/json_test Found 1 items -rw-rw-rw- 3 root hive 46 2016-07-14 18:35 /user/hive/warehouse/json_test/part-m-00000 [root@nightly57-unsecure-1 ~]# hadoop fs -cat /user/hive/warehouse/json_test/part-m-00000 {"id":1,"value":"abcd"} {"id":2,"value":"cd"}
The downside of this method is that you will have to create a Hive/HCatalog table before the import. You will have to clean it if it is not need in future.
Please let me know if you have any further question.
Created 08-18-2016 10:56 AM
Data can be directly read from the Oracle DB and write in the JSON format using SPARK and there is no need for SQOOP in between it.
val DF1 = sqlContext.read.format("jdbc").option("url","<connection string>").option("dbtable","<table name>").option("user","<user name>").option("password","<password>").load()
DF1.write.format("org.apache.spark.sql.json").save(<path>)
This post is under the wrong section but I wanted to tell the way data can be directly loaded from Oracle DB and save as JSON format without SQOOP.
Created 08-21-2016 10:36 PM
This spark code looks awesome!
I have some questions:
1) Does spark read data from Oracle in driver or executors?
2) If Spark reads Oracle in executors, how does the source data importing is split among different executors? That is, how each executor know which part of data is should read?
Sqoop does a lot of optimization in such areas. Now its Oracle connectors, especially the direct connector, can read metadata from Oracle, and use the metadata to split data importing among different mappers. Its performance is quite good. Does Spark have a similar optimization or not.
Created 08-22-2016 09:26 AM
Created 08-23-2016 06:35 AM
Thanks for the insight, @Harsh J!
Have a quick look at the document you pasted here, it looks like that Spark needs extra information (low/high bound) to finish the partitioning, while Sqoop can figure this out by itself. Yes, I agree with you that Spark is very convenient if only an ad-hoc import is needed in a coding env, but other than that, Sqoop is a better tool to import/export data in RDBMS.
Created 08-16-2017 03:07 AM
Great Suggestion!
Thanks for that!
But i would like to ask one question that, if i want to have a struct or array fields in target, then how i should transform the mysql data, so that it will fit in HCatalog schema. The need here is to just have nested data from other collection instead of foreign key representation. Currently we are using sqoop import only, and we are trying to modify the query so that it will be accepted by hcat schema.
Thanks & Regards,
Mahendra