Created 04-26-2017 07:39 AM
I have data in HDFS(Azure HDInsight) in csv format. I am using Pig to process this Data. After processing the Summarise data will be stored in Hive. And then Hive table is exported in RDBMS using Sqoop. Now I need to automate all this process. Is this possible that I will write particular method for all these 3 task in MapReduce, then run this MapReduce job, and all these task execute one by one.
For create MapReduce job , I want to use .Net SDK. So my question is this possible, and if YES than suggest some steps and reference link for this Question. Thank You.
Created 04-26-2017 08:29 AM
Have you evaluated oozie ? I believe you would need to run these repeatedly at some interval. oozie provides support all the above mentioned components i.e pig , hive and sqoop and can be defined as seperate actions in oozie.
You do not need to create seperate MR job ( using .NET SDK ) if you go this route.
Created 04-26-2017 08:29 AM
Have you evaluated oozie ? I believe you would need to run these repeatedly at some interval. oozie provides support all the above mentioned components i.e pig , hive and sqoop and can be defined as seperate actions in oozie.
You do not need to create seperate MR job ( using .NET SDK ) if you go this route.
Created 04-26-2017 10:19 AM
Thanks for reply. It will really help me. By mistake I wrote MapReduce Job, I should have to use HiveJob, PigJob, SqoopJob. Thanks again. I just go through Oozie. I didn't find exact Link for Oozie. If I write a Pig script and than want to transfer those data in Hive. Then using Sqoop, export this data to SQL Server. How to connect all these process using Oozie. Can you provide some reference Link?
Created 04-27-2017 11:23 AM