Support Questions

Find answers, ask questions, and share your expertise

Long running Falcon process (hdfs-replication-workflow)

avatar
Contributor

Falcon is going to drive me to an early grave.

I've been able to create the Falcon replication job (hdfs-replication-workflow), define both the source and target clusters it uses. The process seems to start up in Falcon, two Oozie jobs also startup (see below), and some YARN jobs (see below) but can't seem to get the distcp portion to actually move data.

I'm not sure if this is related to being in (two) sandbox environments or not but there isn't an obvious way to debug whats happening... most things either say "COMPLETED" or stay "RUNNING" forever - nothing noticeable is failing or creating helpful logs. Not sure if that is specific enough but Falcon itself is a can of worms that I don't totally understand yet... Oozie and YARN and dispersed logs... sorry, I'll stop complaining.

Job ID                                   App Name     Status    User  Group     Started                 Ended
------------------------------------------------------------------------------------------------------------------------------------
0000067-160817182129092-oozie-oozi-W     falcon-dr-fs-workflowRUNNING   gpadmin   -         2016-08-18 21:55 GMT    -
------------------------------------------------------------------------------------------------------------------------------------
0000066-160817182129092-oozie-oozi-W     FALCON_PROCESS_DEFAULT_drSyncTestRUNNING   gpadmin   -         2016-08-18 21:55 GMT    -
------------------------------------------------------------------------------------------------------------------------------------
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):2
                Application-Id      Application-Name        Application-Type          User           Queue                   State
       Final-State             Progress                        Tracking-URL
application_1471553333960_0013  oozie:launcher:T=java:W=falcon-dr-fs-workflow:A=dr-replication:ID=0000071-160817182129092-oozie-ooz
i-W                MAPREDUCE   gpadmin         default                 RUNNING               UNDEFINED                   5% htt
p://sandbox2.hortonworks.com:18800
application_1471553333960_0014  distcp: oozie:action:T=java:W=falcon-dr-fs-workflow:A=dr-replication:ID=0000071-160817182129092-ooz
ie-oozi-W                  MAPREDUCE   gpadmin         default                ACCEPTED               UNDEFINED
  0%                                    N/A
1 ACCEPTED SOLUTION

avatar
Master Guru

If you are trying this on Sandbox instances, then try increasing Yarn Capacity Scheduler maximum-am-resource-percent to 0.5 or 0.7, the default may not allow all required MR jobs to run at the same time. Also, make sure plain distcp works (without Falcon). If you still have issues then check Oozie launcher job logs, either from Oozie UI, or from RM UI, or using the "yarn logs" command.

View solution in original post

2 REPLIES 2

avatar
Master Guru

If you are trying this on Sandbox instances, then try increasing Yarn Capacity Scheduler maximum-am-resource-percent to 0.5 or 0.7, the default may not allow all required MR jobs to run at the same time. Also, make sure plain distcp works (without Falcon). If you still have issues then check Oozie launcher job logs, either from Oozie UI, or from RM UI, or using the "yarn logs" command.

avatar
Contributor

Thank you for the suggestion. distcp All alone runs fine on these sandboxes - it seems to be related to the number of required containers for a Falcon-based distcp replication, One top level app master container, one distcp app master container, then some number of worker containers to perform the actual replication. I believe this is resolved.

Thank you again!