Created 02-17-2017 11:13 AM
I have 5 tables belonging to different schemas but have common fields. I need to perform a complex join to fetch data for analysis. I want to store the results in HIVE. Can I do this using sqoop?
Created 02-17-2017 01:03 PM
To add the @Artem Ervits's solution - consider saving your complex query in the source RDBMS as a view. Then just call Sqoop specifying the view name instead of a table name. Makes the code nice and clean.
Created 02-17-2017 12:49 PM
Have you looked at free-form query option in sqoop?
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_free_form_query_imports
I would suggest to save result of a complex query in a database and sqoop that as one dataset.
There is however a note on free-form queries.
Note |
---|
The facility of using free-form query in the current version of Sqoop is limited to simple queries where there are no ambiguous projections and no OR conditions in the WHERE clause. Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results.
|
Created 02-20-2017 07:58 AM
Hi @Artem Ervits. Thanks for the reply. I had read on free from query imports but decided against it because we cannot use ANDs and ORs. I will however try and see what happens. I cant work on the second option you and @bpreachuk suggested as i dont have access to create tables/views in the RDBMS. So the way i'll go is try using the free form query. If that doesnt work i'll import the tables as is from the RDBMS and run complex queries using HIVE. I'll update you on how it goes.
Created 02-17-2017 01:03 PM
To add the @Artem Ervits's solution - consider saving your complex query in the source RDBMS as a view. Then just call Sqoop specifying the view name instead of a table name. Makes the code nice and clean.
Created 02-27-2017 08:02 AM
I was able to import all tables in the following format:
sqoop import -connect jdbc:oracle:thin:@<fdqn>/<server> -username <username> -P -table CUST_NAV -columns "<column names separated by commas" -hive-import -hive-table databasenameinhive.New_CUST_NAV -target-dir 'location in hdfs'
@bpreachuk I understood the workaround in my problem using your suggestion. I'll import all tables as is from the oracle db and create different views which i can then use in my select statements.
Thanks guys.