Support Questions

Find answers, ask questions, and share your expertise

Is it possible to use S3 for Falcon feeds?

avatar

I have not seen any example of using s3 in Falcon except for mirroring. Is it possible to use an S3-bucket as location path for a feed?

1 ACCEPTED SOLUTION

avatar

@Liam Murphy: Please find the details below

1> Ensure that you have an Account with Amazon S3 and a designated bucket for your data

2> You must have an Access Key ID and a Secret Key

3> Configure HDFS for S3 storage by making the following changes to core-site.xml

<property> 
<name>fs.default.name</name> 
<value>s3n://your-bucket-name</value>
</property>

<property> 
<name>fs.s3n.awsAccessKeyId</name> 
<value>YOUR_S3_ACCESS_KEY</value></property>

<property> 
<name>fs.s3n.awsSecretAccessKey</name>   
<value> YOUR_S3_SECRET_KEY </value>
</property>

4>In the falcon feed.xml, specify the Amazon S3 location and schedule the feed

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="S3Replication" description="S3-Replication" xmlns="uri:falcon:feed:0.1">    
<frequency>
hours(1)
</frequency>    
<clusters>        
<cluster name="cluster1" type="source">            
<validity start="2016-09-01T00:00Z" end="2034-12-20T08:00Z"/>            
<retention limit="days(24)" action="delete"/>       
</cluster>        
<cluster name="cluster2" type="target">            
<validity start="2016-09-01T00:00Z" end="2034-12-20T08:00Z"/>           
<retention limit="days(90)" action="delete"/>            
<locations>                
<location type="data" path="s3://<bucket-name>/<path-folder>/${YEAR}-${MONTH}-${DAY}-${HOUR}/"/>            
</locations>        
</cluster>     
</clusters>

View solution in original post

13 REPLIES 13

avatar

Hi Sowmya,

Is there another debug information I can provide to help solve the cause of the problem?

Kind Regards,

Liam

avatar

@Liam Murphy: In Oozie log I can see that replication paths don't exist. Can you make sure files exist ?

Eviction fails because of credentials issue. Can you make sure core-site and hdfs-site has the required configs and restart the services and resubmit the feed? Thanks!

2016-09-09 14:44:43,680  INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@10] [0000058-160909120521096-oozie-oozi-C@10]::ActionInputCheck:: File:hftp://192.168.39.108:50070/falcon/2016-09-09-01, Exists? :false
2016-09-09 14:44:43,817  INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@11] [0000058-160909120521096-oozie-oozi-C@11]::CoordActionInputCheck:: Missing deps:hftp://192.168.39.108:50070/falcon/2016-09-09-01 
2016-09-09 14:44:43,818  INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@11] [0000058-160909120521096-oozie-oozi-C@11]::ActionInputCheck:: In checkListOfPaths: hftp://192.168.39.108:50070/falcon/2016-09-09-01 is Missing.

avatar

I just noticed that when a path does not exist for a given hour falcon/oozie just get stuck!.. rather than check for the next hour? My misunderstanding I guess. Have got it working now.

avatar
Explorer

Hi Team / @Sowmya Ramesh, I am trying to use falcon to replicate HDFS to S3. I have tried above steps and I see the HDFStoS3 replication Job status KILLED after running the workflow. After launching Oozie, I can see the workflow changing status from RUNNING to KILLED. Is there a way to troubleshoot. I can run hadoop fs -ls commands on my s3 bucket so definitely got access. I suspect its the s3 URL. I tried downloading the xml changing the URL without the s3.region.amazonaws.com and uploading with no luck. Any other suggestions. Appreciate all your help/support in advance. Regards

Anil