Created 03-13-2017 07:54 PM
I am using NiFi for my data flow and then I kick off ETL script which runs many (hive/Pig) MR/Tez jobs. Is there easy way to detect (ie trigger) once the job has finished. Creating a trigger manually per job is not scalable since this are many jobs. Going into each job and have it create a trigger is off the table.
Created 03-13-2017 08:17 PM
you can leverage WebHCat for this as one idea, https://cwiki.apache.org/confluence/display/Hive/WebHCat+UsingWebHCat#WebHCatUsingWebHCat-ErrorCodes...
# this will execute a hive query and save result to hdfs file in your home directory called output
curl -s -d execute="select+*+from+sample_08;" \ -d statusdir="output" \ 'http://localhost:50111/templeton/v1/hive?user.name=root'
# if you ls on the directory, it will have two files, stderr and stdout
hdfs dfs -ls output
# if the job succeeded, you can cat the stdout file and view the results
hdfs dfs -cat output/stdout
when you invoke the job, you will get a response with job id, then you can also check whether output directory exists and there's no error log with webhdfs API, in that case job succeedd.
curl -i "http://sandbox.hortonworks.com:50070/webhdfs/v1/user/root/output/?op=LISTSTATUS"
another idea is to leverage Oozie to wire the jobs together, once job completes, you can use SLA monitoring features of Oozie to check whether job completed or send an email (SLA not needed for this) whichever way you go, you can have Nifi watch these events either from JMS topic in ActiveMQ if you intend to use SLA or email alert. https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-...
probably even better idea is to query ATS via REST API https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/TimelineServer.html I think this is probably the most sane approach, you can query ATS for finished job and get status. So once you know the job ID, there are ways to get it, one of them is via my first example, then in the second processor you can query ATS for completion state.
Created 03-13-2017 08:17 PM
you can leverage WebHCat for this as one idea, https://cwiki.apache.org/confluence/display/Hive/WebHCat+UsingWebHCat#WebHCatUsingWebHCat-ErrorCodes...
# this will execute a hive query and save result to hdfs file in your home directory called output
curl -s -d execute="select+*+from+sample_08;" \ -d statusdir="output" \ 'http://localhost:50111/templeton/v1/hive?user.name=root'
# if you ls on the directory, it will have two files, stderr and stdout
hdfs dfs -ls output
# if the job succeeded, you can cat the stdout file and view the results
hdfs dfs -cat output/stdout
when you invoke the job, you will get a response with job id, then you can also check whether output directory exists and there's no error log with webhdfs API, in that case job succeedd.
curl -i "http://sandbox.hortonworks.com:50070/webhdfs/v1/user/root/output/?op=LISTSTATUS"
another idea is to leverage Oozie to wire the jobs together, once job completes, you can use SLA monitoring features of Oozie to check whether job completed or send an email (SLA not needed for this) whichever way you go, you can have Nifi watch these events either from JMS topic in ActiveMQ if you intend to use SLA or email alert. https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-...
probably even better idea is to query ATS via REST API https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/TimelineServer.html I think this is probably the most sane approach, you can query ATS for finished job and get status. So once you know the job ID, there are ways to get it, one of them is via my first example, then in the second processor you can query ATS for completion state.
Created 03-14-2017 06:33 PM
Hi @Sunile Manjee,
Since you are using NiFi to launch jobs, why don't you use NiFi itself to monitor it 😛
I tried to flex NiFi to monitor Yarn jobs by querying ResourceManager , and have documented it and my flow xml is attached in the comments. check it out.
https://community.hortonworks.com/content/kbentry/42995/yarn-application-monitoring-with-nifi.html
In the demo I used it to monitor Failed and Killed jobs only, you can change the query and ask for all the jobs say user smanjee submitted and alert you as soon as its completed/failed/killed.
Thanks
Jobin