Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is there a way to easily detect when a MR/Tez job has completed?

avatar
Master Guru

I am using NiFi for my data flow and then I kick off ETL script which runs many (hive/Pig) MR/Tez jobs. Is there easy way to detect (ie trigger) once the job has finished. Creating a trigger manually per job is not scalable since this are many jobs. Going into each job and have it create a trigger is off the table.

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Sunile Manjee

you can leverage WebHCat for this as one idea, https://cwiki.apache.org/confluence/display/Hive/WebHCat+UsingWebHCat#WebHCatUsingWebHCat-ErrorCodes...

# this will execute a hive query and save result to hdfs file in your home directory called output

curl -s -d execute="select+*+from+sample_08;" \  
 -d statusdir="output" \  
 'http://localhost:50111/templeton/v1/hive?user.name=root'

# if you ls on the directory, it will have two files, stderr and stdout

hdfs dfs -ls output

# if the job succeeded, you can cat the stdout file and view the results

hdfs dfs -cat output/stdout

when you invoke the job, you will get a response with job id, then you can also check whether output directory exists and there's no error log with webhdfs API, in that case job succeedd.

 curl -i "http://sandbox.hortonworks.com:50070/webhdfs/v1/user/root/output/?op=LISTSTATUS"

another idea is to leverage Oozie to wire the jobs together, once job completes, you can use SLA monitoring features of Oozie to check whether job completed or send an email (SLA not needed for this) whichever way you go, you can have Nifi watch these events either from JMS topic in ActiveMQ if you intend to use SLA or email alert. https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-...

probably even better idea is to query ATS via REST API https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/TimelineServer.html I think this is probably the most sane approach, you can query ATS for finished job and get status. So once you know the job ID, there are ways to get it, one of them is via my first example, then in the second processor you can query ATS for completion state.

View solution in original post

2 REPLIES 2

avatar
Master Mentor

@Sunile Manjee

you can leverage WebHCat for this as one idea, https://cwiki.apache.org/confluence/display/Hive/WebHCat+UsingWebHCat#WebHCatUsingWebHCat-ErrorCodes...

# this will execute a hive query and save result to hdfs file in your home directory called output

curl -s -d execute="select+*+from+sample_08;" \  
 -d statusdir="output" \  
 'http://localhost:50111/templeton/v1/hive?user.name=root'

# if you ls on the directory, it will have two files, stderr and stdout

hdfs dfs -ls output

# if the job succeeded, you can cat the stdout file and view the results

hdfs dfs -cat output/stdout

when you invoke the job, you will get a response with job id, then you can also check whether output directory exists and there's no error log with webhdfs API, in that case job succeedd.

 curl -i "http://sandbox.hortonworks.com:50070/webhdfs/v1/user/root/output/?op=LISTSTATUS"

another idea is to leverage Oozie to wire the jobs together, once job completes, you can use SLA monitoring features of Oozie to check whether job completed or send an email (SLA not needed for this) whichever way you go, you can have Nifi watch these events either from JMS topic in ActiveMQ if you intend to use SLA or email alert. https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-...

probably even better idea is to query ATS via REST API https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/TimelineServer.html I think this is probably the most sane approach, you can query ATS for finished job and get status. So once you know the job ID, there are ways to get it, one of them is via my first example, then in the second processor you can query ATS for completion state.

avatar

Hi @Sunile Manjee,

Since you are using NiFi to launch jobs, why don't you use NiFi itself to monitor it 😛

I tried to flex NiFi to monitor Yarn jobs by querying ResourceManager , and have documented it and my flow xml is attached in the comments. check it out.

https://community.hortonworks.com/content/kbentry/42995/yarn-application-monitoring-with-nifi.html

In the demo I used it to monitor Failed and Killed jobs only, you can change the query and ask for all the jobs say user smanjee submitted and alert you as soon as its completed/failed/killed.

Thanks

Jobin