Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

[RE-OPEN] FALCON - Feed entity FeedPAth

avatar
Rising Star

Hi all,

It seems that data path should like if frequency feed is "hours(2)" :

/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}

My question is : all the paths need to be create before on primary and backup cluster ?

/tmp/falcon/next-vers-current/2016/05/26/13/
/tmp/falcon/next-vers-current/2016/05/26/14/
/tmp/falcon/next-vers-current/2016/05/26/15/
1 ACCEPTED SOLUTION

avatar
Super Collaborator

@mayki wogno Thanks for sharing the feed replication entity xml. I have looked around your entity and found that the exception occurred as location type data path is not defined with frequency.

<location type="data"path="/tmp/falcon/"/>

Can you define path as follows: path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"

I am hoping once you define with frequency this will work.

Also in the shared entity I am seeing that you have specified the same HDFS path for source and target. Can you please check this as well.

View solution in original post

9 REPLIES 9

avatar
Super Collaborator

@mayki wogno To answer your question, atleast frequency based data feed must be available on primary cluster to copy the data on backup cluster periodically through scheduled feed replication. If data is not available on primary cluster, then scheduled instance will be in waiting state for data availability.

avatar
Rising Star

It seems what my question is not clear :

I want to submit this feed :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="next-vers-current" description="next-vers-current" xmlns="uri:falcon:feed:0.1">
 <frequency>hours(6)</frequency>
 <timezone>UTC</timezone>
 <clusters>
 <cluster name="next-rec-cluster" type="source">
 <validity start="2016-05-01T12:00Z" end="2016-05-27T23:00Z"/>
 <retention limit="hours(2)" action="delete"/>
 <locations>
 <location type="data" path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"/>
 </locations>
 </cluster>
 <cluster name="current-rec-cluster" type="target">
 <validity start="2016-05-01T12:00Z" end="2016-05-27T23:00Z"/>
 <retention limit="days(2)" action="delete"/>
 <locations>
 <location type="data" path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"/>
 </locations>
 </cluster>
 </clusters>
 <locations>
 <location type="data" path="/tmp/falcon/"/>
 <location type="stats" path="/none"/>
 <location type="meta" path="/none"/>
 </locations>
 <ACL owner="falcon" group="hadoop" permission="0755"/>
 <schema location="/none" provider="none"/>
 <properties><property name="queueName" value="oozie-launcher"/></properties>
</feed>

falcon entity -type feed -submit -file next-vers-current.xml
ERROR: Bad Request;default/org.apache.falcon.FalconWebException::org.apache.falcon.FalconException: Feeds default path pattern: ${nameNode}/tmp/falcon, does not match with cluster: next-rec-cluster path pattern: hdfs://master001.next.rec.mapreduce.m1.p.fti.net:8020/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}

So my question, It is normal that i need create all paths with this extension ?

${YEAR}/${MONTH}/${DAY}/${HOUR}

avatar
Super Collaborator

@mayki wogno Thanks for sharing the feed replication entity xml. I have looked around your entity and found that the exception occurred as location type data path is not defined with frequency.

<location type="data"path="/tmp/falcon/"/>

Can you define path as follows: path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"

I am hoping once you define with frequency this will work.

Also in the shared entity I am seeing that you have specified the same HDFS path for source and target. Can you please check this as well.

avatar
Rising Star

@peeyush : What's difference between 'location data path' in cluster section and feed section ?

avatar
Super Collaborator

@mayki wogno 'location data path' in feed section is initial source data path, which can be overridden by 'location data path' if defined in source cluster section.

avatar
Rising Star

@peeyush: so why in ma case the 'location data path' in feed section rise an alert ? As you said 'location data path' in section cluster overrriden on.

Nevermind, now i put the same path in all sections, now submit and schedule are OK.

Thanks all.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="next-vers-current" description="next-vers-current" xmlns="uri:falcon:feed:0.1">
 <frequency>hours(2)</frequency>
 <timezone>UTC</timezone>
 <clusters>
 <cluster name="next-rec-cluster" type="source">
 <validity start="2016-05-27T14:00Z" end="2016-05-28T23:00Z"/>
 <retention limit="hours(6)" action="delete"/>
 <locations>
 <location type="data" path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"/>
 </locations>
 </cluster>
 <cluster name="current-rec-cluster" type="target">
 <validity start="2016-05-01T14:00Z" end="2016-05-28T23:00Z"/>
 <retention limit="days(2)" action="delete"/>
 <locations>
 <location type="data" path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"/>
 </locations>
 </cluster>
 </clusters>
 <locations>
 <location type="data" path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"/>
 <location type="stats" path="/none"/>
 <location type="meta" path="/none"/>
 </locations>
 <ACL owner="falcon" group="hadoop" permission="0755"/>
 <schema location="/none" provider="none"/>
 <properties><property name="queueName" value="oozie-launcher"/></properties>
</feed>

avatar
Super Collaborator

@mayki wogno Earlier 'location data path' in your feed section raised an alert as Frequency " ${YEAR}/${MONTH}/${DAY}/${HOUR}" was missing from data path.

Thanks for confirming that it works now for you.

avatar
Rising Star

@peeyush as said in my last comment, regarding my news feed-replication.xml it works now. Thanks.

avatar
Rising Star

Hi again,

There is something weird in the workflow FALCON_FEED_RETENTION , the feedDataPath is wrong

feedDataPath    
        DATA=hdfs://clusterA:8020/tmp/falcon/next-vers-current/?{YEAR}/?{MONTH}/?{DAY}/?{HOUR}

for FALCON_FEED_REPLICATION, the feedDataPath is correct :

distcpSourcePaths
                  hftp://clusterA:50070/tmp/falcon/next-vers-current/2016/05/27/12
distcpTargetPaths
		hdfs://clusterB/tmp/falcon/next-vers-current/2016/05/27/12/

What's wrong in my feed-replication.xml ?