Reply
Highlighted
Explorer
Posts: 26
Registered: ‎12-08-2016

sqooping data from RDBMS tables to HDFS in parquet file format through oozie

Hi,

 

I am using oozie and sqooping the data from RDBMS on daily basis. I am doing research on following use case.

 

1. Importing data from RDBMS table in parquet format into HDFS. The import will be done on specific time on daily basis through oozie. sqoop incremental update, append, last-value is not working for me. After doing some research I understood that incremental update and etc, will not work for parquet file format.

 

Now, I want use --query(on date cloumn of RDBMS table) in sqoop script and get the data on daily basis without using incremental, append type of methods.

 

How can we change the dates in --query dynamically each time(every other day) the sqoop job is started.  Can one sqoop code work for every day import jobs or do we need to create 10 jobs for 10 different days/dates.

 

Is there anyother way for doing this?

 

Any help is highly appreciated.

Explorer
Posts: 22
Registered: ‎01-08-2016

Re: sqooping data from RDBMS tables to HDFS in parquet file format through oozie

I think you should use where clause in sqoop import with date ranges. The date ranges should be used from a control file where you should mention the date ranges in each sqoop run also with the status of each run for tracking.
Explorer
Posts: 26
Registered: ‎12-08-2016

Re: sqooping data from RDBMS tables to HDFS in parquet file format through oozie

Hi Vini,

 

Thank you for your response.

 

After some R&D work the issue is solved through sqoop --query paramenter and Oozie cordinator. Now, things going as expected. 

 

Announcements
New solutions