Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Falcon Process running but not processing Data in HDP2.3

avatar

We have setup a falcon process that reads data from a HDFS location and saves the o/p thru pig process into another HDFS location. The Feeds and Processes are running in the cluster but I cannot see any output generated.

My XML for process is as below :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<process name="demo1Process" xmlns="uri:falcon:process:0.1">

<tags>processName=demo1Process</tags>

<clusters>

<cluster name="Atlas-Demo1">

<validity start="2016-01-28T20:51Z" end="2017-02-02T20:51Z"/>

</cluster>

</clusters>

<parallel>2</parallel>

<order>FIFO</order>

<frequency>minutes(5)</frequency>

<timezone>GMT+05:50</timezone>

<inputs>

<input name="inputfeed" feed="demo1Feed" start="yesterday(0,0)" end="today(-1,0)"/>

</inputs>

<outputs>

<output name="outoutfeed" feed="demo1OutputFeed" instance="yesterday(0,0)"/>

</outputs>

<workflow name="select_airlines_data" version="pig-0.12.0" engine="pig" path="/falcon/demo1/code/demo1.pig"/>

<retry policy="exp-backoff" delay="minutes(3)" attempts="2"/>

<ACL owner="falcon" group="falcon" permission="0755"/>

</process>

My XML for Input Feed is as below

<feed xmlns='uri:falcon:feed:0.1' name='demo1InputFeed' description='demo1 input feed'> <tags>feed_name=demo1InputFeed</tags> <groups>input</groups> <frequency>minutes(1)</frequency> <timezone>GMT+05:50</timezone> <late-arrival cut-off='minutes(3)'/> <clusters> <cluster name='demo1cluster' type='source'> <validity start='2016-01-28T07:49Z' end='2017-02-01T07:49Z'/> <retention limit='days(2)' action='delete'/> <locations> <location type='data'> </location> <location type='stats'> </location> <location type='meta'> </location> </locations> </cluster> </clusters> <locations> <location type='data' path='/falcon/demo1/data/${YEAR}-${MONTH}'> </location> <location type='stats' path='/falcon/demo1/status'> </location> <location type='meta' path='/falcon/demo1/meta'> </location> </locations> <ACL owner='falcon' group='falcon' permission='0755'/> <schema location='none' provider='none'/> <properties> <property name='jobPriority' value='HIGH'> </property> </properties> </feed>

My Input folder is (in HDFS)

/falcon/demo1/data/2016-01

1 ACCEPTED SOLUTION

avatar

Nayan Paul: There are couple of issues in your entity xml's.

1> The granularity of date pattern in the location path should be at least that of a frequency of a feed.

2> yesterday(hours,minutes): As the name suggest EL yesterday picks up feed instances with respect to start of day yesterday. Hours and minutes are added to the 00 hours starting yesterday, Example: yesterday(24,30) will actually correspond to 00:30 am of today, for 2010-01-02T01:30Z this would mean 2010-01-02:00:30 feed.

Input location path in the feed xml is /falcon/demo1/data/${YEAR}-${MONTH} but frequency is in minutes. Also if you want to process data of the month please use lastMonth or currentMonth EL expression.

Please refer EL expression doc for more details. Refer this doc for entity specification details. Thanks!

View solution in original post

21 REPLIES 21

avatar
Master Guru

Perhaps some error in the pig script? You will find the logs in the logs of the oozie launcher action ( the map task log ) or the pig action that gets spawned off. Hue is convenient to get the logs or you can go directly to the resourcemanager UI.

When a process is executed you will have one job that is the launcher it will contain the parameters for the pig script and any error that is returned by the pig command.

You will have a second job that is the actual pig execution.

You should find the problems in one or the other.

If these jobs don't exist you can also go to the oozie ui and see why these actions are not spawned off.

avatar
Explorer

If you run Pig Scripts manually, outside of Falcon, do you get an error?

avatar

Thanks for the quick reply.

I just tested the pig replacing $input and $output with actual HDFS path and pig job is running fine.

Also my feed has input as path as /falcon/demo1/data/${YEAR}-${MONTH} where as my actual HDFS path is /falcon/demo1/data/2016-01. Can this be a probable mismatch.

avatar
Master Guru

? The path looks good. ${YEAR} is replaced with the current year and so on.. However what do you see when you look into ResourceManager as described above.

avatar
Contributor

I would cross-check the following:

  • process validity start/end dates
  • input start/end dates
  • feed validity start/end dates
  • input path pattern
  • timezone

If you want data to be picked up for a particular process instance, they feed must be valid (read this as the feed is expected to be populated) during that time, and the data must be in a directory that matches the expected pattern. Look at your Oozie coordinator actions for details on what HDFS paths are being waited for.

avatar

Nayan Paul: There are couple of issues in your entity xml's.

1> The granularity of date pattern in the location path should be at least that of a frequency of a feed.

2> yesterday(hours,minutes): As the name suggest EL yesterday picks up feed instances with respect to start of day yesterday. Hours and minutes are added to the 00 hours starting yesterday, Example: yesterday(24,30) will actually correspond to 00:30 am of today, for 2010-01-02T01:30Z this would mean 2010-01-02:00:30 feed.

Input location path in the feed xml is /falcon/demo1/data/${YEAR}-${MONTH} but frequency is in minutes. Also if you want to process data of the month please use lastMonth or currentMonth EL expression.

Please refer EL expression doc for more details. Refer this doc for entity specification details. Thanks!

avatar

Thanks for the help. I am able to run the falcon process now

avatar
Explorer

Can you hardcode the path to /falcon/demo1/data/2016-01?

avatar
Explorer

Can you look for any error codes/messages in the Oozie console (via Ambari) or perhaps provide the full stack trace which usually has an output like: "Causedby: org.apache.falcon.FalconException:” Confirm the scripts and directories (absolute paths) are chmoded to ‘777’ or at least ‘775’.