I am using oozie and sqooping the data from RDBMS on daily basis. I am doing research on following use case.
1. Importing data from RDBMS table in parquet format into HDFS. The import will be done on specific time on daily basis through oozie. sqoop incremental update, append, last-value is not working for me. After doing some research I understood that incremental update and etc, will not work for parquet file format.
Now, I want use --query(on date cloumn of RDBMS table) in sqoop script and get the data on daily basis without using incremental, append type of methods.
How can we change the dates in --query dynamically each time(every other day) the sqoop job is started. Can one sqoop code work for every day import jobs or do we need to create 10 jobs for 10 different days/dates.
Is there anyother way for doing this?
Any help is highly appreciated.
Thank you for your response.
After some R&D work the issue is solved through sqoop --query paramenter and Oozie cordinator. Now, things going as expected.