Member since
09-29-2015
57
Posts
49
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1487 | 05-25-2017 06:03 PM | |
1342 | 10-19-2016 10:17 PM | |
1654 | 09-28-2016 08:41 PM | |
988 | 09-21-2016 05:46 PM | |
4677 | 09-06-2016 11:49 PM |
05-25-2017
06:03 PM
3 Kudos
You need to add below properties in core-site.xml and restart the affected components. You can refer https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Superusers.html for more detials. Thanks! <property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
... View more
03-13-2017
11:35 PM
1 Kudo
Recipes framework capability to support HDFS and Hive mirroring was added in Apache Falcon 0.6.0 release and it was a client side logic. With 0.10 release its moved to server side and renamed as server side extensions as part of jira https://issues.apache.org/jira/browse/FALCON-1107. For any new mirror job to be submitted and managed, Falcon extensions should be used. Please refer https://falcon.apache.org/restapi/ExtensionEnumeration.html for more details. Supported DistCp options for HDFS mirroring in HDP 2.5: distcpMaxMaps distcpMapBandwidth overwrite ignoreErrors skipChecksum removeDeletedFiles preserveBlockSize preserveReplicationNumber preservePermission preserveUser preserveGroup preserveChecksumType preserveAcl preserveXattr preserveTimes Hdfs mirroring job can be scheduled using extension as below: falcon extension -submitAndSchedule -extensionName hdfs-mirroring -file sales-monthly.properties
Content of sales-monthly.properties file:
jobName=sales-monthly
jobValidityStart=2016-06-30T00:00Z
jobValidityEnd=2099-12-31T11:59Z
jobFrequency=minutes(45)
jobTimezone=UTC
sourceCluster=primaryCluster
targetCluster=backupCluster
jobClusterName=primaryCluster
sourceDir=/user/ambari-qa/sales-monthly/input
targetDir=/user/ambari-qa/sales-monthly/output
removeDeletedFiles=true
skipChecksum=false
preservePermission=true
preserveUser=true
Refer hdfs-mirroring-properties.json for properties supported in HDFS mirroring. Supported DistCp options for Hive mirroring in HDP 2.5:
distcpMaxMaps distcpMapBandwidth Hive mirroring job can be scheduled using extension as below: falcon extension -submitAndSchedule -extensionName hive-mirroring -file hive-sales-monthly.properties
Content of hive-sales-monthly.properties file:
jobName=hive-sales-monthly
sourceCluster=primaryCluster
targetCluster=backupCluster
jobClusterName=primaryCluster
jobValidityStart=2016-07-19T00:02Z
jobValidityEnd=2018-05-25T11:02Z
jobFrequency=minutes(30)
jobRetryPolicy=periodic
jobRetryDelay=minutes(30)
jobRetryAttempts=3
distcpMaxMaps=1
distcpMapBandwidth=100
maxEvents=-1
replicationMaxMaps=5
sourceDatabases=default
sourceTables=*
sourceHiveServer2Uri=hive2://primary:10000
targetHiveServer2Uri=hive2://backup:10000
Refer hive-mirroring-properties.json for properties supported in Hive mirroring.
... View more
Labels:
10-19-2016
10:34 PM
Looks like this is not supported in UI. You can do a it through REST API or command line. Refer https://falcon.apache.org/restapi/ExtensionSubmitAndSchedule.html or update existing mirror jobs using https://falcon.apache.org/restapi/ExtensionUpdate.html Add "removeDeletedFiles=true" as POSt parameter. Thanks!
... View more
10-19-2016
10:17 PM
2 Kudos
@Jasper Pass removeDeletedFiles property in the request. This maps to -delete option in DistCP. Thanks!
... View more
09-28-2016
08:41 PM
Falcon supports feed replication and mirroring. 1> For Falcon feed replication, execution order is FIFO as this is based on feed/data avilability 2> For mirroring execution order is LAST_ONLY as replication job has to run only once to catch up According to Oozie doc execution: Specifies the execution order if multiple instances of the coordinator job have satisfied their execution criteria. Valid values are:
1> FIFO (oldest first) default
2> LIFO (newest first)
3> ONLYLAST (discards all older materializations)
... View more
09-21-2016
05:46 PM
@Saurabh For entity current functionality supported is to suspend using the entity name, so it has to be one by one. For instances start and end time can be used Suspend is used to suspend a instance or instances for the given process. This option pauses the parent workflow at the state, which it was in at the time of execution of this command.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -suspend -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
... View more
09-15-2016
06:41 PM
1 Kudo
@Liam Murphy: In Oozie log I can see that replication paths don't exist. Can you make sure files exist ? Eviction fails because of credentials issue. Can you make sure core-site and hdfs-site has the required configs and restart the services and resubmit the feed? Thanks! 2016-09-09 14:44:43,680 INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@10] [0000058-160909120521096-oozie-oozi-C@10]::ActionInputCheck:: File:hftp://192.168.39.108:50070/falcon/2016-09-09-01, Exists? :false
2016-09-09 14:44:43,817 INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@11] [0000058-160909120521096-oozie-oozi-C@11]::CoordActionInputCheck:: Missing deps:hftp://192.168.39.108:50070/falcon/2016-09-09-01
2016-09-09 14:44:43,818 INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@11] [0000058-160909120521096-oozie-oozi-C@11]::ActionInputCheck:: In checkListOfPaths: hftp://192.168.39.108:50070/falcon/2016-09-09-01 is Missing.
... View more
09-08-2016
07:31 PM
@Liam Murphy: Can you attach the Feed xml and Falcon and oozie logs? Looks like eviction is failing. Can you see if the replication succeeded? Oozie bundle created will have one for retention and another for replication. Thanks!
... View more
09-06-2016
11:49 PM
5 Kudos
@Liam Murphy: Please find the details below 1> Ensure that you have an Account with Amazon S3 and a designated bucket for your data 2> You must have an Access Key ID and a Secret Key 3> Configure HDFS for S3 storage by making the following changes to core-site.xml <property>
<name>fs.default.name</name>
<value>s3n://your-bucket-name</value>
</property>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>YOUR_S3_ACCESS_KEY</value></property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value> YOUR_S3_SECRET_KEY </value>
</property> 4>In the falcon feed.xml, specify the Amazon S3 location and schedule the feed <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="S3Replication" description="S3-Replication" xmlns="uri:falcon:feed:0.1">
<frequency>
hours(1)
</frequency>
<clusters>
<cluster name="cluster1" type="source">
<validity start="2016-09-01T00:00Z" end="2034-12-20T08:00Z"/>
<retention limit="days(24)" action="delete"/>
</cluster>
<cluster name="cluster2" type="target">
<validity start="2016-09-01T00:00Z" end="2034-12-20T08:00Z"/>
<retention limit="days(90)" action="delete"/>
<locations>
<location type="data" path="s3://<bucket-name>/<path-folder>/${YEAR}-${MONTH}-${DAY}-${HOUR}/"/>
</locations>
</cluster>
</clusters>
... View more