Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

I have multiple tables which i need to join and fetch results into HIVE. Can I do this using Sqoop?

avatar
Contributor

I have 5 tables belonging to different schemas but have common fields. I need to perform a complex join to fetch data for analysis. I want to store the results in HIVE. Can I do this using sqoop?

1 ACCEPTED SOLUTION

avatar

To add the @Artem Ervits's solution - consider saving your complex query in the source RDBMS as a view. Then just call Sqoop specifying the view name instead of a table name. Makes the code nice and clean.

View solution in original post

4 REPLIES 4

avatar
Master Mentor

Have you looked at free-form query option in sqoop?

https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_free_form_query_imports

I would suggest to save result of a complex query in a database and sqoop that as one dataset.

There is however a note on free-form queries.

Note
The facility of using free-form query in the current version of Sqoop is limited to simple queries where there are no ambiguous projections and no OR conditions in the WHERE clause. Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results.

avatar
Contributor

Hi @Artem Ervits. Thanks for the reply. I had read on free from query imports but decided against it because we cannot use ANDs and ORs. I will however try and see what happens. I cant work on the second option you and @bpreachuk suggested as i dont have access to create tables/views in the RDBMS. So the way i'll go is try using the free form query. If that doesnt work i'll import the tables as is from the RDBMS and run complex queries using HIVE. I'll update you on how it goes.

avatar

To add the @Artem Ervits's solution - consider saving your complex query in the source RDBMS as a view. Then just call Sqoop specifying the view name instead of a table name. Makes the code nice and clean.

avatar
Contributor

I was able to import all tables in the following format:

sqoop import -connect jdbc:oracle:thin:@<fdqn>/<server> -username <username> -P -table CUST_NAV -columns "<column names separated by commas" -hive-import -hive-table databasenameinhive.New_CUST_NAV -target-dir 'location in hdfs'

@bpreachuk I understood the workaround in my problem using your suggestion. I'll import all tables as is from the oracle db and create different views which i can then use in my select statements.

Thanks guys.