Created 05-26-2016 12:08 PM
Hi all,
It seems that data path should like if frequency feed is "hours(2)" :
/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}
My question is : all the paths need to be create before on primary and backup cluster ?
/tmp/falcon/next-vers-current/2016/05/26/13/ /tmp/falcon/next-vers-current/2016/05/26/14/ /tmp/falcon/next-vers-current/2016/05/26/15/
Created 05-26-2016 03:36 PM
@mayki wogno Thanks for sharing the feed replication entity xml. I have looked around your entity and found that the exception occurred as location type data path is not defined with frequency.
<location type="data"path="/tmp/falcon/"/>
Can you define path as follows: path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"
I am hoping once you define with frequency this will work.
Also in the shared entity I am seeing that you have specified the same HDFS path for source and target. Can you please check this as well.
Created 05-26-2016 12:19 PM
@mayki wogno To answer your question, atleast frequency based data feed must be available on primary cluster to copy the data on backup cluster periodically through scheduled feed replication. If data is not available on primary cluster, then scheduled instance will be in waiting state for data availability.
Created 05-26-2016 01:10 PM
It seems what my question is not clear :
I want to submit this feed :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <feed name="next-vers-current" description="next-vers-current" xmlns="uri:falcon:feed:0.1"> <frequency>hours(6)</frequency> <timezone>UTC</timezone> <clusters> <cluster name="next-rec-cluster" type="source"> <validity start="2016-05-01T12:00Z" end="2016-05-27T23:00Z"/> <retention limit="hours(2)" action="delete"/> <locations> <location type="data" path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"/> </locations> </cluster> <cluster name="current-rec-cluster" type="target"> <validity start="2016-05-01T12:00Z" end="2016-05-27T23:00Z"/> <retention limit="days(2)" action="delete"/> <locations> <location type="data" path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"/> </locations> </cluster> </clusters> <locations> <location type="data" path="/tmp/falcon/"/> <location type="stats" path="/none"/> <location type="meta" path="/none"/> </locations> <ACL owner="falcon" group="hadoop" permission="0755"/> <schema location="/none" provider="none"/> <properties><property name="queueName" value="oozie-launcher"/></properties> </feed>
falcon entity -type feed -submit -file next-vers-current.xml
ERROR: Bad Request;default/org.apache.falcon.FalconWebException::org.apache.falcon.FalconException: Feeds default path pattern: ${nameNode}/tmp/falcon, does not match with cluster: next-rec-cluster path pattern: hdfs://master001.next.rec.mapreduce.m1.p.fti.net:8020/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}
So my question, It is normal that i need create all paths with this extension ?
${YEAR}/${MONTH}/${DAY}/${HOUR}
Created 05-26-2016 03:36 PM
@mayki wogno Thanks for sharing the feed replication entity xml. I have looked around your entity and found that the exception occurred as location type data path is not defined with frequency.
<location type="data"path="/tmp/falcon/"/>
Can you define path as follows: path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"
I am hoping once you define with frequency this will work.
Also in the shared entity I am seeing that you have specified the same HDFS path for source and target. Can you please check this as well.
Created 05-26-2016 09:15 PM
@peeyush : What's difference between 'location data path' in cluster section and feed section ?
Created 05-27-2016 08:50 AM
@mayki wogno 'location data path' in feed section is initial source data path, which can be overridden by 'location data path' if defined in source cluster section.
Created 05-27-2016 08:57 AM
@peeyush: so why in ma case the 'location data path' in feed section rise an alert ? As you said 'location data path' in section cluster overrriden on.
Nevermind, now i put the same path in all sections, now submit and schedule are OK.
Thanks all.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <feed name="next-vers-current" description="next-vers-current" xmlns="uri:falcon:feed:0.1"> <frequency>hours(2)</frequency> <timezone>UTC</timezone> <clusters> <cluster name="next-rec-cluster" type="source"> <validity start="2016-05-27T14:00Z" end="2016-05-28T23:00Z"/> <retention limit="hours(6)" action="delete"/> <locations> <location type="data" path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"/> </locations> </cluster> <cluster name="current-rec-cluster" type="target"> <validity start="2016-05-01T14:00Z" end="2016-05-28T23:00Z"/> <retention limit="days(2)" action="delete"/> <locations> <location type="data" path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"/> </locations> </cluster> </clusters> <locations> <location type="data" path="/tmp/falcon/next-vers-current/${YEAR}/${MONTH}/${DAY}/${HOUR}"/> <location type="stats" path="/none"/> <location type="meta" path="/none"/> </locations> <ACL owner="falcon" group="hadoop" permission="0755"/> <schema location="/none" provider="none"/> <properties><property name="queueName" value="oozie-launcher"/></properties> </feed>
Created 05-27-2016 09:27 AM
@mayki wogno Earlier 'location data path' in your feed section raised an alert as Frequency " ${YEAR}/${MONTH}/${DAY}/${HOUR}" was missing from data path.
Thanks for confirming that it works now for you.
Created 05-27-2016 09:34 AM
@peeyush as said in my last comment, regarding my news feed-replication.xml it works now. Thanks.
Created 05-27-2016 12:37 PM
Hi again,
There is something weird in the workflow FALCON_FEED_RETENTION , the feedDataPath is wrong
feedDataPath DATA=hdfs://clusterA:8020/tmp/falcon/next-vers-current/?{YEAR}/?{MONTH}/?{DAY}/?{HOUR}
for FALCON_FEED_REPLICATION, the feedDataPath is correct :
distcpSourcePaths hftp://clusterA:50070/tmp/falcon/next-vers-current/2016/05/27/12 distcpTargetPaths hdfs://clusterB/tmp/falcon/next-vers-current/2016/05/27/12/
What's wrong in my feed-replication.xml ?