Support Questions

Find answers, ask questions, and share your expertise

Automate the process of Pig, Hive, Sqoop.

avatar
Explorer

I have data in HDFS(Azure HDInsight) in csv format. I am using Pig to process this Data. After processing the Summarise data will be stored in Hive. And then Hive table is exported in RDBMS using Sqoop. Now I need to automate all this process. Is this possible that I will write particular method for all these 3 task in MapReduce, then run this MapReduce job, and all these task execute one by one.

For create MapReduce job , I want to use .Net SDK. So my question is this possible, and if YES than suggest some steps and reference link for this Question. Thank You.

1 ACCEPTED SOLUTION

avatar
@Ishvari Dhimmar

Have you evaluated oozie ? I believe you would need to run these repeatedly at some interval. oozie provides support all the above mentioned components i.e pig , hive and sqoop and can be defined as seperate actions in oozie.

You do not need to create seperate MR job ( using .NET SDK ) if you go this route.

View solution in original post

3 REPLIES 3

avatar
@Ishvari Dhimmar

Have you evaluated oozie ? I believe you would need to run these repeatedly at some interval. oozie provides support all the above mentioned components i.e pig , hive and sqoop and can be defined as seperate actions in oozie.

You do not need to create seperate MR job ( using .NET SDK ) if you go this route.

avatar
Explorer

Thanks for reply. It will really help me. By mistake I wrote MapReduce Job, I should have to use HiveJob, PigJob, SqoopJob. Thanks again. I just go through Oozie. I didn't find exact Link for Oozie. If I write a Pig script and than want to transfer those data in Hive. Then using Sqoop, export this data to SQL Server. How to connect all these process using Oozie. Can you provide some reference Link?