I have a question regarding Oozie and stabilyzing my development process. First of all to write safe code I found whis article about test driven development in hadoop. As far as I understand this means that I have to provide developer tests sperately for each tool (i.e. Sqoop, Hive, Spark,..).
So how would a typical developement process look like in your opinion? Should all code for Hive and Spark be written first and then be tested by unit tests which were defined before developing the actual code? This means using Beeline or HiveRunner as well as Spark-Testing-Base. And only then test the Oozie Worklfow with minioozie?
In addition I would like to know how I can handle errors in Oozie appropriately. I had the feeling that sometimes an error occured (maybe in Hive or something else) and the complete workflow was stopped at that point. So the action was stopped and did not even reach the point where Oozie decides whether to use the OK or ERROR branch. So al my error handling in Oozie was not useful. When and how can that happen? Is is a type of error which has to be tested before in the tool itself and not in Oozie? Maybe I do not really understand how Oozie delegates the action to YARN and where those errors rise.
Any help on this topic is really appreciated. Brwosing the web I didn't come up with many input which refers precisely on this topic.
Thanks in advance!
I suggest you develop and test all the job locally first such as Sqoop and Hive script. For the Spark application, please make sure you compile you jar with Hortonworks dependencies to prevent any dependency issue.
For troubleshooting Oozie workflow, first thing is to check Oozie log and search by workflow id. Then check the Yarn application log for the Oozie launcher and the child job.