Reply
Expert Contributor
Posts: 108
Registered: ‎05-19-2016

retry connect to server and exit code

[ Edited ]

I have been running into 137 exit container code and this is what I got from one of my sqoop job logs:

 

18/05/30 11:49:45 INFO mapreduce.Job: Running job: job_1527499476017_1258818/05/30 11:49:52 INFO mapreduce.Job: Job job_1527499476017_12588 running in uber mode : false18/05/30 11:49:52 INFO mapreduce.Job:map 0% reduce 0%18/05/30 11:50:04 INFO ipc.Client: Retrying connect to server: ip-172-31-4-147.ap-south-1.compute.internal/172.31.4.147:43852. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)18/05/30 11:50:05 INFO ipc.Client: Retrying connect to server: ip-172-31-4-147.ap-south-1.compute.internal/172.31.4.147:43852. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)18/05/30 11:50:06 INFO ipc.Client: Retrying connect to server: ip-172-31-4-147.ap-south-1.compute.internal/172.31.4.147:43852. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)18/05/30 11:51:04 INFO mapreduce.Job: Task Id : attempt_1527499476017_12588_m_000009_1000, Status : FAILEDContainer killed on request. Exit code is 137Container exited with a non-zero exit code 137Killed by external signal18/05/30 11:51:04 INFO mapreduce.Job: Task Id : attempt_1527499476017_12588_m_000008_1000, Status : FAILEDContainer killed on request. Exit code is 137Container exited with a non-zero exit code 137Killed by external signal18/05/30 11:52:06 INFO mapreduce.Job:map 20% reduce 0%<br>

What does it say about the error? My services are up and running and there have been no unexpected exits either

 

How do I fix it?

Expert Contributor
Posts: 62
Registered: ‎02-23-2018

Re: retry connect to server and exit code

Hi @sim6,

 

When you run your sqoop job, you have enought RAM memory??

 

It seems to try 2 times but not have Ram to allocate the job.

 

 

Regards,

Manu.

Expert Contributor
Posts: 108
Registered: ‎05-19-2016

Re: retry connect to server and exit code

[ Edited ]

My jobs are running through oozie. Where does it say it's running out of RAM? There was not much load on the server when this job started.

 

I have been running into this error at times when there is less load on the server as well. How should I fix if it was running out of memory? @manuroman

Expert Contributor
Posts: 62
Registered: ‎02-23-2018

Re: retry connect to server and exit code

Hi,

 

First, you can post what is the memory assigned to oozie.launcher?

Second, you could increase memory in the property oozie.launcher.mapreduce.map.memory.mb and then try to run the sqoop job

 

Also, you could monitoring in the usage of memory when it is running.

 

Regards,

Manu.

Expert Contributor
Posts: 108
Registered: ‎05-19-2016

Re: retry connect to server and exit code

[ Edited ]

@manuroman It's 1 GB. Does it matter in this case? 

 

Regarding, monitoring the usage of memory when it is running: How do I do that?

 

In CM, I see the following info and according to this, there is sufficient physical memory on all my hosts. Do I need to monitor it from somewhere else?

 

                              IP                    Roles       Last HB .    Load Average            Disk usage          Phyical Memory 

  172.31.1.128 2 Role(s)13.35s ago7.26  7.36  7.81
125.9 GiB / 1000 GiB
5.2 GiB / 62.5 GiB
 
  172.31.1.207 3 Role(s)1.15s ago18.16  18.98  20.50
438.1 GiB / 2 TiB
12.7 GiB / 62.5 GiB
 
  172.31.10.74 3 Role(s)3.59s ago7.29  7.42  7.66
431.1 GiB / 2 TiB
12.1 GiB / 62.5 GiB
 
  172.31.13.118 6 Role(s)4.43s ago17.26  17.94  18.42
1.1 TiB / 1.4 TiB
10.7 GiB / 31 GiB
 
  172.31.4.147 2 Role(s)12.8s ago7.15  7.66  8.15
125.9 GiB / 1000 GiB
4.9 GiB / 62.5 GiB
 
  172.31.4.192 14 Role(s)5.95s ago8.59  8.70  8.67
2 TiB / 3.4 TiB
25.9 GiB / 62.5 GiB
 
  172.31.5.201 2 Role(s)6.12s ago18.62  19.43  21.28
173.9 GiB / 2 TiB
6.8 GiB / 62.5 GiB
 
  172.31.6.221 2 Role(s)14.8s ago32.90  35.52  36.23
125.8 GiB / 1000 GiB
5.5 GiB / 62.5 GiB
 
  172.31.6.44 8 Role(s)12.94s ago16.07  16.10  16.10
917.6 GiB / 1000 GiB
6.4 GiB / 31 GiB
 
  172.31.6.58 4 Role(s)6.76s ago18.77  18.49  18.86
17.2 GiB / 1000 GiB
3.9 GiB / 31 GiB
 
Expert Contributor
Posts: 62
Registered: ‎02-23-2018

Re: retry connect to server and exit code

Well @sim6,

 

You have assigned only 1gb for oozie role, if the job need more memory this will be killed after 3 times.

 

Then try to assing more than 1gb, for example 4 or 6 gb since you have more available memory.

 

 

Regards,

Manu.

Expert Contributor
Posts: 108
Registered: ‎05-19-2016

Re: retry connect to server and exit code

[ Edited ]

I never knew that memory would be consumed from oozie memory. I hope you mean Java heap space that's allocated to oozie or is it something else were speaking of?

 @manuroman

Expert Contributor
Posts: 108
Registered: ‎05-19-2016

Re: retry connect to server and exit code

Also, one more thing, All my node managers result in unexpected exits if my Resource Manager is running on Server1.

What I did was installed it on another server2 with standby mode.

 

Now, if I delete resource manager from the node it was installed on(Server1), All my oozie jobs get killed with this error:

 

Job Tracker(server1:8032) is not whitelisted. Only server1:8032 or 8020 is. What's happening here? How do I fix this?

Announcements