- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
I have multiple tables which i need to join and fetch results into HIVE. Can I do this using Sqoop?
- Labels:
-
Apache Sqoop
Created ‎02-17-2017 11:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have 5 tables belonging to different schemas but have common fields. I need to perform a complex join to fetch data for analysis. I want to store the results in HIVE. Can I do this using sqoop?
Created ‎02-17-2017 01:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To add the @Artem Ervits's solution - consider saving your complex query in the source RDBMS as a view. Then just call Sqoop specifying the view name instead of a table name. Makes the code nice and clean.
Created ‎02-17-2017 12:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you looked at free-form query option in sqoop?
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_free_form_query_imports
I would suggest to save result of a complex query in a database and sqoop that as one dataset.
There is however a note on free-form queries.
Note |
---|
The facility of using free-form query in the current version of Sqoop is limited to simple queries where there are no ambiguous projections and no OR conditions in the WHERE clause. Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results.
|
Created ‎02-20-2017 07:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Artem Ervits. Thanks for the reply. I had read on free from query imports but decided against it because we cannot use ANDs and ORs. I will however try and see what happens. I cant work on the second option you and @bpreachuk suggested as i dont have access to create tables/views in the RDBMS. So the way i'll go is try using the free form query. If that doesnt work i'll import the tables as is from the RDBMS and run complex queries using HIVE. I'll update you on how it goes.
Created ‎02-17-2017 01:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To add the @Artem Ervits's solution - consider saving your complex query in the source RDBMS as a view. Then just call Sqoop specifying the view name instead of a table name. Makes the code nice and clean.
Created ‎02-27-2017 08:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was able to import all tables in the following format:
sqoop import -connect jdbc:oracle:thin:@<fdqn>/<server> -username <username> -P -table CUST_NAV -columns "<column names separated by commas" -hive-import -hive-table databasenameinhive.New_CUST_NAV -target-dir 'location in hdfs'
@bpreachuk I understood the workaround in my problem using your suggestion. I'll import all tables as is from the oracle db and create different views which i can then use in my select statements.
Thanks guys.
