Member since
09-29-2015
57
Posts
49
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
975 | 05-25-2017 06:03 PM | |
643 | 10-19-2016 10:17 PM | |
886 | 09-28-2016 08:41 PM | |
458 | 09-21-2016 05:46 PM | |
1981 | 09-06-2016 11:49 PM |
05-25-2017
06:03 PM
3 Kudos
You need to add below properties in core-site.xml and restart the affected components. You can refer https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Superusers.html for more detials. Thanks! <property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
... View more
03-13-2017
11:35 PM
1 Kudo
Recipes framework capability to support HDFS and Hive mirroring was added in Apache Falcon 0.6.0 release and it was a client side logic. With 0.10 release its moved to server side and renamed as server side extensions as part of jira https://issues.apache.org/jira/browse/FALCON-1107. For any new mirror job to be submitted and managed, Falcon extensions should be used. Please refer https://falcon.apache.org/restapi/ExtensionEnumeration.html for more details. Supported DistCp options for HDFS mirroring in HDP 2.5: distcpMaxMaps distcpMapBandwidth overwrite ignoreErrors skipChecksum removeDeletedFiles preserveBlockSize preserveReplicationNumber preservePermission preserveUser preserveGroup preserveChecksumType preserveAcl preserveXattr preserveTimes Hdfs mirroring job can be scheduled using extension as below: falcon extension -submitAndSchedule -extensionName hdfs-mirroring -file sales-monthly.properties
Content of sales-monthly.properties file:
jobName=sales-monthly
jobValidityStart=2016-06-30T00:00Z
jobValidityEnd=2099-12-31T11:59Z
jobFrequency=minutes(45)
jobTimezone=UTC
sourceCluster=primaryCluster
targetCluster=backupCluster
jobClusterName=primaryCluster
sourceDir=/user/ambari-qa/sales-monthly/input
targetDir=/user/ambari-qa/sales-monthly/output
removeDeletedFiles=true
skipChecksum=false
preservePermission=true
preserveUser=true
Refer hdfs-mirroring-properties.json for properties supported in HDFS mirroring. Supported DistCp options for Hive mirroring in HDP 2.5:
distcpMaxMaps distcpMapBandwidth Hive mirroring job can be scheduled using extension as below: falcon extension -submitAndSchedule -extensionName hive-mirroring -file hive-sales-monthly.properties
Content of hive-sales-monthly.properties file:
jobName=hive-sales-monthly
sourceCluster=primaryCluster
targetCluster=backupCluster
jobClusterName=primaryCluster
jobValidityStart=2016-07-19T00:02Z
jobValidityEnd=2018-05-25T11:02Z
jobFrequency=minutes(30)
jobRetryPolicy=periodic
jobRetryDelay=minutes(30)
jobRetryAttempts=3
distcpMaxMaps=1
distcpMapBandwidth=100
maxEvents=-1
replicationMaxMaps=5
sourceDatabases=default
sourceTables=*
sourceHiveServer2Uri=hive2://primary:10000
targetHiveServer2Uri=hive2://backup:10000
Refer hive-mirroring-properties.json for properties supported in Hive mirroring.
... View more
- Find more articles tagged with:
- Falcon
- How-ToTutorial
- mirroring
- Sandbox & Learning
Labels:
03-13-2017
05:58 PM
You will to capture relevant logs - i.e. when you submit and schedule the job. You can either submit a new mirroring job or delete and resubmit the old job. Thanks!
... View more
03-10-2017
10:46 PM
I don't see any error in the falcon application log. Can you tail the log into another file before submitting and scheduling the mirroring job and send it across? How are you scheduling the mirroring job? Are you using command line or UI? Are you seeing any errors on submission? Also, what is the name of the job? It's hard to debug the issue without relevant logs. Thanks!
... View more
10-19-2016
10:34 PM
Looks like this is not supported in UI. You can do a it through REST API or command line. Refer https://falcon.apache.org/restapi/ExtensionSubmitAndSchedule.html or update existing mirror jobs using https://falcon.apache.org/restapi/ExtensionUpdate.html Add "removeDeletedFiles=true" as POSt parameter. Thanks!
... View more
10-19-2016
10:17 PM
2 Kudos
@Jasper Pass removeDeletedFiles property in the request. This maps to -delete option in DistCP. Thanks!
... View more
09-28-2016
08:41 PM
Falcon supports feed replication and mirroring. 1> For Falcon feed replication, execution order is FIFO as this is based on feed/data avilability 2> For mirroring execution order is LAST_ONLY as replication job has to run only once to catch up According to Oozie doc execution: Specifies the execution order if multiple instances of the coordinator job have satisfied their execution criteria. Valid values are:
1> FIFO (oldest first) default
2> LIFO (newest first)
3> ONLYLAST (discards all older materializations)
... View more
09-21-2016
05:46 PM
@Saurabh For entity current functionality supported is to suspend using the entity name, so it has to be one by one. For instances start and end time can be used Suspend is used to suspend a instance or instances for the given process. This option pauses the parent workflow at the state, which it was in at the time of execution of this command.
Usage: $FALCON_HOME/bin/falcon instance -type <<feed/process>> -name <<name>> -suspend -start "yyyy-MM-dd'T'HH:mm'Z'" -end "yyyy-MM-dd'T'HH:mm'Z'"
... View more
09-15-2016
06:41 PM
1 Kudo
@Liam Murphy: In Oozie log I can see that replication paths don't exist. Can you make sure files exist ? Eviction fails because of credentials issue. Can you make sure core-site and hdfs-site has the required configs and restart the services and resubmit the feed? Thanks! 2016-09-09 14:44:43,680 INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@10] [0000058-160909120521096-oozie-oozi-C@10]::ActionInputCheck:: File:hftp://192.168.39.108:50070/falcon/2016-09-09-01, Exists? :false
2016-09-09 14:44:43,817 INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@11] [0000058-160909120521096-oozie-oozi-C@11]::CoordActionInputCheck:: Missing deps:hftp://192.168.39.108:50070/falcon/2016-09-09-01
2016-09-09 14:44:43,818 INFO CoordActionInputCheckXCommand:520 - SERVER[sandbox.hortonworks.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000058-160909120521096-oozie-oozi-C] ACTION[0000058-160909120521096-oozie-oozi-C@11] [0000058-160909120521096-oozie-oozi-C@11]::ActionInputCheck:: In checkListOfPaths: hftp://192.168.39.108:50070/falcon/2016-09-09-01 is Missing.
... View more
09-08-2016
07:31 PM
@Liam Murphy: Can you attach the Feed xml and Falcon and oozie logs? Looks like eviction is failing. Can you see if the replication succeeded? Oozie bundle created will have one for retention and another for replication. Thanks!
... View more
09-06-2016
11:49 PM
5 Kudos
@Liam Murphy: Please find the details below 1> Ensure that you have an Account with Amazon S3 and a designated bucket for your data 2> You must have an Access Key ID and a Secret Key 3> Configure HDFS for S3 storage by making the following changes to core-site.xml <property>
<name>fs.default.name</name>
<value>s3n://your-bucket-name</value>
</property>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>YOUR_S3_ACCESS_KEY</value></property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value> YOUR_S3_SECRET_KEY </value>
</property> 4>In the falcon feed.xml, specify the Amazon S3 location and schedule the feed <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<feed name="S3Replication" description="S3-Replication" xmlns="uri:falcon:feed:0.1">
<frequency>
hours(1)
</frequency>
<clusters>
<cluster name="cluster1" type="source">
<validity start="2016-09-01T00:00Z" end="2034-12-20T08:00Z"/>
<retention limit="days(24)" action="delete"/>
</cluster>
<cluster name="cluster2" type="target">
<validity start="2016-09-01T00:00Z" end="2034-12-20T08:00Z"/>
<retention limit="days(90)" action="delete"/>
<locations>
<location type="data" path="s3://<bucket-name>/<path-folder>/${YEAR}-${MONTH}-${DAY}-${HOUR}/"/>
</locations>
</cluster>
</clusters>
... View more
09-02-2016
05:17 AM
What limitations are we talking about here? Sorry, I don't understand your question. If you are asking about DIstCP options supported in HDFS Mirroirng, currently below options are supported maxMaps mapBandwidth Below additional options can be supported by using workaround given below: overwrite ignoreErrors skipChecksum removeDeletedFiles preserveBlockSize preserveReplicationNumber preservePermission Please modify the WF hdfs-replication-workflow.xml as below. After distcpMapBandwidth add below content <arg>-overwrite </arg>
<arg>${overwrite}</arg>
<arg>-ignoreErrors </arg>
<arg>${ignoreErrors}</arg>
<arg>-skipChecksum </arg>
<arg>${skipChecksum}</arg>
<arg>-removeDeletedFiles </arg>
<arg>${removeDeletedFiles}</arg>
<arg>-preserveBlockSize </arg>
<arg>${preserveBlockSize}</arg>
<arg>-preserveReplicationNumber </arg>
<arg>${preserveReplicationNumber}</arg>
<arg>-preservePermission </arg>
<arg>${preservePermission}</arg>
Pass below options in hdfs-replication.properties overwrite=false
ignoreErrors=false
skipChecksum=false
removeDeletedFiles=true
preserveBlockSize=true
preserveReplicationNumber=true
preservePermission=true
These will work OOTB as FeedReplicator already has support for this and hence no code change is required. Thanks!
... View more
09-01-2016
06:55 PM
2 Kudos
@Kyle Dunn: Falcon doesn't support those DistCP options and yes that would require a code change.
... View more
07-07-2016
07:04 PM
1 Kudo
@Dhaval Modi: Following is the prerequisites to use Hive Mirrroring Hive 1.2.0+ Oozie 4.2.0+* Falcon Hive Mirroring is not supported without those prerequisites. Thanks!
... View more
05-12-2016
07:02 PM
You cannot use ${DAY-1} in feed. IF you want to process previous day's data then you can achieve it using process by using yesterday EL expression. <inputs>
<input name="input" feed="SampleInput" start="yesterday(0,0)" end="today(-1,0)" />
</inputs>
... View more
05-12-2016
06:48 PM
As @Benjamin Leonhardi specified Falcon should honor the retry attempts in Retry policy. If its not working as expected please create a support issue. Thanks!
... View more
05-09-2016
06:42 PM
3 Kudos
@Piotr Pruski: As you mentioned Falcon piggy backs on DistCP under the hood to achieve replication. If another client is still writing to a source file, the copy will likely fail If the DistCP job fails then Falcon replication job fails too and status API/command can be used to get the finished status of the replication job. Same in case of success too. Also with FALCON-1313 support was added for email based notification for job status for Feeds and mirror recipes. Replication using snapshots is not yet supported in Falcon. This feature is added with FALCON-1861. Additional benefit is performance. It leverages HDFS snapshots which are very cost effective to create ( cost is O(1) excluding inode lookup time).
Once created, it is very efficient to find modifications relative to a snapshot and copy over these
modifications for disaster recovery (DR). This makes it's cost effective.
... View more
03-05-2016
01:21 AM
2 Kudos
@Pavel Benes Exception "javax.xml.bind.UnmarshalException: [org.xml.sax.SAXParseException; Premature end of file." can occur for various reasons. Its xml parser exception. Please ensure the entity xml generated correctly. Quick google search will point you to various reasons why "Premature end of file" exception can occur. Can you attach the entity xml generated when this exception occurs?! Thanks!
... View more
02-05-2016
05:54 PM
3 Kudos
You can refer this doc Atlas Falcon bridge.
... View more
02-02-2016
09:46 PM
2 Kudos
@Balu: I already replied with same analysis. I asked him to change the process start time to 2016-01 instead https://community.hortonworks.com/answers/12696/view.html
... View more
02-02-2016
01:05 AM
1 Kudo
@Nayan Paul: If you look at the oozie job launched when the falcon process entity is scheduled do you see any errors? Does running pig script outside Falcon works as intended? Can you please attach the Falcon, Oozie and MR logs to debug this issue?
... View more
01-31-2016
08:16 PM
2 Kudos
@Nayan Paul: Your process xml has the validity as <validity start="2015-12-01T23:33Z" end="2018-01-03T23:33Z"/> and frequency is every 5 minutes. Instance Number | Process Instance start Time | Feeds to process[currentMonth(0,0,0)-currentMonth(31,0,0)]
1 2015-12-01T23:33Z 2015-12-01T00:00Z - 2015-12-31T00:00Z
2 2015-12-01T28:33Z 2015-12-01T00:00Z - 2015-12-31T00:00Z
3 2015-12-01T33:33Z 2015-12-01T00:00Z - 2015-12-31T00:00Z
... As you can see, process instances generated are for 2015-12 and feeds generated in 2016-01 will be processed when the process instance start time is 2016-01-*. Please change the process instance validity start in process xml to "2016-01-01T00:00Z". Thanks!
... View more
01-29-2016
06:14 PM
1 Kudo
Nayan Paul: Can you provide the input feed and process entities used when the error was thrown? Falcon throws this error if process input feed start is before the input feed validity start. currentMonth(day,hour,minute): Current month takes the reference to start of the month with respect to process instance start time. One thing to keep in mind is that day is added to the first day of the month. So the value of day is the number of days you want to add to the first day of the month. For example: for instance start time 2010-01-12T01:30Z and El as currentMonth(3,2,40) will correspond to feed created at 2010-01-04T02:40Z and currentMonth(0,0,0) will mean 2010-01-01T00:00Z. Looks like currentMonth is evaluating date which is before input feed validity start.
... View more
01-28-2016
10:20 PM
3 Kudos
Benjamin Leonhardi: You may be hitting this issue FALCON-1647. What HDP version are you using? Thanks!
... View more
01-28-2016
09:30 PM
Benjamin Leonhardi: Can you please provide the error you are getting? Also, can you attach all your entity definitions? Also, please paste the output of below command. Thanks! hadoop fs -ls -R /apps/falcon
... View more
01-28-2016
07:02 PM
2 Kudos
Nayan Paul: There are couple of issues in your entity xml's.
1> The granularity of date pattern in the location path should be at least that of a frequency of a feed.
2> yesterday(hours,minutes): As the name suggest EL yesterday picks up feed instances with respect to start of day yesterday. Hours and minutes are added to the 00 hours starting yesterday, Example: yesterday(24,30) will actually correspond to 00:30 am of today, for 2010-01-02T01:30Z this would mean 2010-01-02:00:30 feed. Input location path in the feed xml is /falcon/demo1/data/${YEAR}-${MONTH} but frequency is in minutes. Also if you want to process data of the month please use lastMonth or currentMonth EL expression. Please refer EL expression doc for more details. Refer this doc for entity specification details. Thanks!
... View more
01-28-2016
06:36 PM
khushi kalra:
Validity of a feed on cluster specifies duration for which this feed is valid on this cluster.
Process validity defines how long the workflow should run. It has 3 components - start time, end time and timezone. Start time and end time are timestamps defined in yyyy-MM-dd'T'HH:mm'Z' format and should always be in UTC. Timezone is used to compute the next instances starting from start time. The workflow will start at start time and end before end time specified on a given cluster.
Please refer this doc for more details. Thanks!
... View more
01-28-2016
06:32 PM
1 Kudo
Benjamin Leonhardi: You need to provide correct permissions to staging and working directories. Please use falcon tutorial for reference.
... View more
01-28-2016
01:30 AM
1 Kudo
Sushil Saxena: Regarding issue in Falcon there is FALCON-1787 jira created.
If you are running Falcon process, updating "oozie.action.sharelib.for.pig" in Oozie-site.xml will not help. Falcon generates the pig action at the run time & generates the workflow, it uses pig-action.xml defined in Falcon codebase. pig-action.xml that Falcon uses does not have hive in the share lib config "oozie.action.sharelib.for.pig".
Workflow action configuration generated by Falcon is overriding the one defined in ooze-site.xml.
Few work arounds:
1> In the process entity definition add new custom property. Resubmit the process.
<properties>
<property name="oozie.action.sharelib.for.pig" value="hive,pig,hcatalog"/>
</properties>
2> Update pig-action.xml to have hive in the share lib config and repackage falcon-oozie-adaptor-<version>.jar and replace jar at "/usr/hdp/current/falcon-server/webapp/falcon/WEB-INF/lib” and restart Falcon
3> If you have Falcon code downloaded then update pig-action.xml at oozie/src/main/resources/action/process/pig-action.xml and then build falcon and reinstall it
... View more