Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

sqooping data from RDBMS tables to HDFS in parquet file format through oozie

Highlighted

sqooping data from RDBMS tables to HDFS in parquet file format through oozie

Contributor

Hi,

 

I am using oozie and sqooping the data from RDBMS on daily basis. I am doing research on following use case.

 

1. Importing data from RDBMS table in parquet format into HDFS. The import will be done on specific time on daily basis through oozie. sqoop incremental update, append, last-value is not working for me. After doing some research I understood that incremental update and etc, will not work for parquet file format.

 

Now, I want use --query(on date cloumn of RDBMS table) in sqoop script and get the data on daily basis without using incremental, append type of methods.

 

How can we change the dates in --query dynamically each time(every other day) the sqoop job is started.  Can one sqoop code work for every day import jobs or do we need to create 10 jobs for 10 different days/dates.

 

Is there anyother way for doing this?

 

Any help is highly appreciated.

2 REPLIES 2

Re: sqooping data from RDBMS tables to HDFS in parquet file format through oozie

Explorer
I think you should use where clause in sqoop import with date ranges. The date ranges should be used from a control file where you should mention the date ranges in each sqoop run also with the status of each run for tracking.

Re: sqooping data from RDBMS tables to HDFS in parquet file format through oozie

Contributor

Hi Vini,

 

Thank you for your response.

 

After some R&D work the issue is solved through sqoop --query paramenter and Oozie cordinator. Now, things going as expected.