Created 08-18-2016 09:58 PM
Falcon is going to drive me to an early grave.
I've been able to create the Falcon replication job (hdfs-replication-workflow), define both the source and target clusters it uses. The process seems to start up in Falcon, two Oozie jobs also startup (see below), and some YARN jobs (see below) but can't seem to get the distcp portion to actually move data.
I'm not sure if this is related to being in (two) sandbox environments or not but there isn't an obvious way to debug whats happening... most things either say "COMPLETED" or stay "RUNNING" forever - nothing noticeable is failing or creating helpful logs. Not sure if that is specific enough but Falcon itself is a can of worms that I don't totally understand yet... Oozie and YARN and dispersed logs... sorry, I'll stop complaining.
Job ID App Name Status User Group Started Ended ------------------------------------------------------------------------------------------------------------------------------------ 0000067-160817182129092-oozie-oozi-W falcon-dr-fs-workflowRUNNING gpadmin - 2016-08-18 21:55 GMT - ------------------------------------------------------------------------------------------------------------------------------------ 0000066-160817182129092-oozie-oozi-W FALCON_PROCESS_DEFAULT_drSyncTestRUNNING gpadmin - 2016-08-18 21:55 GMT - ------------------------------------------------------------------------------------------------------------------------------------
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):2 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1471553333960_0013 oozie:launcher:T=java:W=falcon-dr-fs-workflow:A=dr-replication:ID=0000071-160817182129092-oozie-ooz i-W MAPREDUCE gpadmin default RUNNING UNDEFINED 5% htt p://sandbox2.hortonworks.com:18800 application_1471553333960_0014 distcp: oozie:action:T=java:W=falcon-dr-fs-workflow:A=dr-replication:ID=0000071-160817182129092-ooz ie-oozi-W MAPREDUCE gpadmin default ACCEPTED UNDEFINED 0% N/A
Created 08-19-2016 12:12 AM
If you are trying this on Sandbox instances, then try increasing Yarn Capacity Scheduler maximum-am-resource-percent to 0.5 or 0.7, the default may not allow all required MR jobs to run at the same time. Also, make sure plain distcp works (without Falcon). If you still have issues then check Oozie launcher job logs, either from Oozie UI, or from RM UI, or using the "yarn logs" command.
Created 08-19-2016 12:12 AM
If you are trying this on Sandbox instances, then try increasing Yarn Capacity Scheduler maximum-am-resource-percent to 0.5 or 0.7, the default may not allow all required MR jobs to run at the same time. Also, make sure plain distcp works (without Falcon). If you still have issues then check Oozie launcher job logs, either from Oozie UI, or from RM UI, or using the "yarn logs" command.
Created 08-22-2016 04:58 PM
Thank you for the suggestion. distcp All alone runs fine on these sandboxes - it seems to be related to the number of required containers for a Falcon-based distcp replication, One top level app master container, one distcp app master container, then some number of worker containers to perform the actual replication. I believe this is resolved.
Thank you again!