Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Falcon Process running but not processing Data in HDP2.3

avatar

We have setup a falcon process that reads data from a HDFS location and saves the o/p thru pig process into another HDFS location. The Feeds and Processes are running in the cluster but I cannot see any output generated.

My XML for process is as below :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<process name="demo1Process" xmlns="uri:falcon:process:0.1">

<tags>processName=demo1Process</tags>

<clusters>

<cluster name="Atlas-Demo1">

<validity start="2016-01-28T20:51Z" end="2017-02-02T20:51Z"/>

</cluster>

</clusters>

<parallel>2</parallel>

<order>FIFO</order>

<frequency>minutes(5)</frequency>

<timezone>GMT+05:50</timezone>

<inputs>

<input name="inputfeed" feed="demo1Feed" start="yesterday(0,0)" end="today(-1,0)"/>

</inputs>

<outputs>

<output name="outoutfeed" feed="demo1OutputFeed" instance="yesterday(0,0)"/>

</outputs>

<workflow name="select_airlines_data" version="pig-0.12.0" engine="pig" path="/falcon/demo1/code/demo1.pig"/>

<retry policy="exp-backoff" delay="minutes(3)" attempts="2"/>

<ACL owner="falcon" group="falcon" permission="0755"/>

</process>

My XML for Input Feed is as below

<feed xmlns='uri:falcon:feed:0.1' name='demo1InputFeed' description='demo1 input feed'> <tags>feed_name=demo1InputFeed</tags> <groups>input</groups> <frequency>minutes(1)</frequency> <timezone>GMT+05:50</timezone> <late-arrival cut-off='minutes(3)'/> <clusters> <cluster name='demo1cluster' type='source'> <validity start='2016-01-28T07:49Z' end='2017-02-01T07:49Z'/> <retention limit='days(2)' action='delete'/> <locations> <location type='data'> </location> <location type='stats'> </location> <location type='meta'> </location> </locations> </cluster> </clusters> <locations> <location type='data' path='/falcon/demo1/data/${YEAR}-${MONTH}'> </location> <location type='stats' path='/falcon/demo1/status'> </location> <location type='meta' path='/falcon/demo1/meta'> </location> </locations> <ACL owner='falcon' group='falcon' permission='0755'/> <schema location='none' provider='none'/> <properties> <property name='jobPriority' value='HIGH'> </property> </properties> </feed>

My Input folder is (in HDFS)

/falcon/demo1/data/2016-01

1 ACCEPTED SOLUTION

avatar

Nayan Paul: There are couple of issues in your entity xml's.

1> The granularity of date pattern in the location path should be at least that of a frequency of a feed.

2> yesterday(hours,minutes): As the name suggest EL yesterday picks up feed instances with respect to start of day yesterday. Hours and minutes are added to the 00 hours starting yesterday, Example: yesterday(24,30) will actually correspond to 00:30 am of today, for 2010-01-02T01:30Z this would mean 2010-01-02:00:30 feed.

Input location path in the feed xml is /falcon/demo1/data/${YEAR}-${MONTH} but frequency is in minutes. Also if you want to process data of the month please use lastMonth or currentMonth EL expression.

Please refer EL expression doc for more details. Refer this doc for entity specification details. Thanks!

View solution in original post

21 REPLIES 21

avatar

@Balu: I already replied with same analysis. I asked him to change the process start time to 2016-01 instead https://community.hortonworks.com/answers/12696/view.html

avatar
Expert Contributor

Ah, my bad. Too many messages here and I missed your solution. Thanks @Sowmya Ramesh