Created 09-06-2016 10:29 AM
I have not seen any example of using s3 in Falcon except for mirroring. Is it possible to use an S3-bucket as location path for a feed?
Created 09-06-2016 11:49 PM
@Liam Murphy: Please find the details below
1> Ensure that you have an Account with Amazon S3 and a designated bucket for your data
2> You must have an Access Key ID and a Secret Key
3> Configure HDFS for S3 storage by making the following changes to core-site.xml
<property> <name>fs.default.name</name> <value>s3n://your-bucket-name</value> </property> <property> <name>fs.s3n.awsAccessKeyId</name> <value>YOUR_S3_ACCESS_KEY</value></property> <property> <name>fs.s3n.awsSecretAccessKey</name> <value> YOUR_S3_SECRET_KEY </value> </property>
4>In the falcon feed.xml, specify the Amazon S3 location and schedule the feed
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <feed name="S3Replication" description="S3-Replication" xmlns="uri:falcon:feed:0.1"> <frequency> hours(1) </frequency> <clusters> <cluster name="cluster1" type="source"> <validity start="2016-09-01T00:00Z" end="2034-12-20T08:00Z"/> <retention limit="days(24)" action="delete"/> </cluster> <cluster name="cluster2" type="target"> <validity start="2016-09-01T00:00Z" end="2034-12-20T08:00Z"/> <retention limit="days(90)" action="delete"/> <locations> <location type="data" path="s3://<bucket-name>/<path-folder>/${YEAR}-${MONTH}-${DAY}-${HOUR}/"/> </locations> </cluster> </clusters>
Created 09-15-2016 11:49 AM
Hi Sowmya,
Is there another debug information I can provide to help solve the cause of the problem?
Kind Regards,
Liam
Created 09-15-2016 06:41 PM
@Liam Murphy: In Oozie log I can see that replication paths don't exist. Can you make sure files exist ?
Eviction fails because of credentials issue. Can you make sure core-site and hdfs-site has the required configs and restart the services and resubmit the feed? Thanks!
2016-09-09 14:44:43,680 INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@10] [0000058-160909120521096-oozie-oozi-C@10]::ActionInputCheck:: File:hftp://192.168.39.108:50070/falcon/2016-09-09-01, Exists? :false 2016-09-09 14:44:43,817 INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@11] [0000058-160909120521096-oozie-oozi-C@11]::CoordActionInputCheck:: Missing deps:hftp://192.168.39.108:50070/falcon/2016-09-09-01 2016-09-09 14:44:43,818 INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@11] [0000058-160909120521096-oozie-oozi-C@11]::ActionInputCheck:: In checkListOfPaths: hftp://192.168.39.108:50070/falcon/2016-09-09-01 is Missing.
Created 09-16-2016 03:48 PM
I just noticed that when a path does not exist for a given hour falcon/oozie just get stuck!.. rather than check for the next hour? My misunderstanding I guess. Have got it working now.
Created 11-28-2017 09:19 AM
Hi Team / @Sowmya Ramesh, I am trying to use falcon to replicate HDFS to S3. I have tried above steps and I see the HDFStoS3 replication Job status KILLED after running the workflow. After launching Oozie, I can see the workflow changing status from RUNNING to KILLED. Is there a way to troubleshoot. I can run hadoop fs -ls commands on my s3 bucket so definitely got access. I suspect its the s3 URL. I tried downloading the xml changing the URL without the s3.region.amazonaws.com and uploading with no luck. Any other suggestions. Appreciate all your help/support in advance. Regards
Anil