Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Is it possible to use S3 for Falcon feeds?

avatar
New Member

I have not seen any example of using s3 in Falcon except for mirroring. Is it possible to use an S3-bucket as location path for a feed?

1 ACCEPTED SOLUTION

avatar

@Liam Murphy: Please find the details below

1> Ensure that you have an Account with Amazon S3 and a designated bucket for your data

2> You must have an Access Key ID and a Secret Key

3> Configure HDFS for S3 storage by making the following changes to core-site.xml

<property> 
<name>fs.default.name</name> 
<value>s3n://your-bucket-name</value>
</property>

<property> 
<name>fs.s3n.awsAccessKeyId</name> 
<value>YOUR_S3_ACCESS_KEY</value></property>

<property> 
<name>fs.s3n.awsSecretAccessKey</name>   
<value> YOUR_S3_SECRET_KEY </value>
</property>

4>In the falcon feed.xml, specify the Amazon S3 location and schedule the feed

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="S3Replication" description="S3-Replication" xmlns="uri:falcon:feed:0.1">    
<frequency>
hours(1)
</frequency>    
<clusters>        
<cluster name="cluster1" type="source">            
<validity start="2016-09-01T00:00Z" end="2034-12-20T08:00Z"/>            
<retention limit="days(24)" action="delete"/>       
</cluster>        
<cluster name="cluster2" type="target">            
<validity start="2016-09-01T00:00Z" end="2034-12-20T08:00Z"/>           
<retention limit="days(90)" action="delete"/>            
<locations>                
<location type="data" path="s3://<bucket-name>/<path-folder>/${YEAR}-${MONTH}-${DAY}-${HOUR}/"/>            
</locations>        
</cluster>     
</clusters>

View solution in original post

13 REPLIES 13

avatar
New Member

Hi Sowmya,

Is there another debug information I can provide to help solve the cause of the problem?

Kind Regards,

Liam

avatar

@Liam Murphy: In Oozie log I can see that replication paths don't exist. Can you make sure files exist ?

Eviction fails because of credentials issue. Can you make sure core-site and hdfs-site has the required configs and restart the services and resubmit the feed? Thanks!

2016-09-09 14:44:43,680  INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@10] [0000058-160909120521096-oozie-oozi-C@10]::ActionInputCheck:: File:hftp://192.168.39.108:50070/falcon/2016-09-09-01, Exists? :false
2016-09-09 14:44:43,817  INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@11] [0000058-160909120521096-oozie-oozi-C@11]::CoordActionInputCheck:: Missing deps:hftp://192.168.39.108:50070/falcon/2016-09-09-01 
2016-09-09 14:44:43,818  INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@11] [0000058-160909120521096-oozie-oozi-C@11]::ActionInputCheck:: In checkListOfPaths: hftp://192.168.39.108:50070/falcon/2016-09-09-01 is Missing.

avatar
New Member

I just noticed that when a path does not exist for a given hour falcon/oozie just get stuck!.. rather than check for the next hour? My misunderstanding I guess. Have got it working now.

avatar
New Member

Hi Team / @Sowmya Ramesh, I am trying to use falcon to replicate HDFS to S3. I have tried above steps and I see the HDFStoS3 replication Job status KILLED after running the workflow. After launching Oozie, I can see the workflow changing status from RUNNING to KILLED. Is there a way to troubleshoot. I can run hadoop fs -ls commands on my s3 bucket so definitely got access. I suspect its the s3 URL. I tried downloading the xml changing the URL without the s3.region.amazonaws.com and uploading with no luck. Any other suggestions. Appreciate all your help/support in advance. Regards

Anil