Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MapReduce client retrying to connect after job completion

SOLVED Go to solution

MapReduce client retrying to connect after job completion

Explorer

Running on Hadoop 2.6.0-cdh5.7.0 and issuing a simple Pig script.
After a successful job completion I'm getting the following message :

Seems like the workers are trying to communicate with each other (with a maximum of 3 retries) but I'm not sure why, and where this behavior is configured.

Does anyone know how to solve this issue ?

 

Counters:
Total records written : 46933
Total bytes written : 12822705
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1469941650260_0002  ->      job_1469941650260_0011,
job_1469941650260_0003  ->      job_1469941650260_0011,
job_1469941650260_0001  ->      job_1469941650260_0005,job_1469941650260_0006,
job_1469941650260_0005  ->      job_1469941650260_0006,
job_1469941650260_0006  ->      job_1469941650260_0007,
job_1469941650260_0007  ->      job_1469941650260_0008,job_1469941650260_0009,
job_1469941650260_0004  ->      job_1469941650260_0008,
job_1469941650260_0008  ->      job_1469941650260_0010,
job_1469941650260_0010  ->      job_1469941650260_0011,
job_1469941650260_0009  ->      job_1469941650260_0011,
job_1469941650260_0011


2016-07-31 05:28:54,418 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:55,419 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:56,420 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p1.c.project.internal/10.240.0.22:38762. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:56,527 [MainThread] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:28:57,626 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:58,628 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:59,629 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:35325. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:28:59,732 [MainThread] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:00,833 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:01,834 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:02,835 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:45573. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:02,939 [MainThread] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:04,051 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:05,052 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:06,053 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:36934. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:06,157 [MainThread] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:07,244 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:08,245 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:09,246 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker2.c.project.internal/10.240.0.24:43862. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:09,350 [MainThread] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:10,643 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:11,644 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:12,645 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:38481. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:12,749 [MainThread] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:13,832 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:14,833 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:15,834 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker-p2.c.project.internal/10.240.0.17:34431. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:15,937 [MainThread] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:17,045 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:18,046 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:19,047 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker1.c.project.internal/10.240.0.27:38757. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:19,149 [MainThread] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:20,230 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:21,231 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:22,232 [MainThread] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: cdh-worker3.c.project.internal/10.240.0.25:37952. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2016-07-31 05:29:22,335 [MainThread] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-07-31 05:29:22,417 [MainThread] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
1 ACCEPTED SOLUTION

Accepted Solutions

Re: MapReduce client retrying to connect after job completion

Master Guru
This is standard behaviour currently. The retries of 3 times to reach an MR2 AM is controlled via "yarn.app.mapreduce.client-am.ipc.max-retries".

If the client cannot reach the MR2 AM, which is the definitive source of truth during application execution, then it falls back to asking the RM which may answer the state and the redirection if the application has completed.
2 REPLIES 2

Re: MapReduce client retrying to connect after job completion

Master Guru
This is standard behaviour currently. The retries of 3 times to reach an MR2 AM is controlled via "yarn.app.mapreduce.client-am.ipc.max-retries".

If the client cannot reach the MR2 AM, which is the definitive source of truth during application execution, then it falls back to asking the RM which may answer the state and the redirection if the application has completed.

Re: MapReduce client retrying to connect after job completion

Explorer

Thanks, good to know.