05-30-2018 12:12 AM - last edited on 05-30-2018 06:02 AM by cjervis
I have been running into 137 exit container code and this is what I got from one of my sqoop job logs:
18/05/30 11:49:45 INFO mapreduce.Job: Running job: job_1527499476017_1258818/05/30 11:49:52 INFO mapreduce.Job: Job job_1527499476017_12588 running in uber mode : false18/05/30 11:49:52 INFO mapreduce.Job:map 0% reduce 0%18/05/30 11:50:04 INFO ipc.Client: Retrying connect to server: ip-172-31-4-147.ap-south-1.compute.internal/172.31.4.147:43852. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)18/05/30 11:50:05 INFO ipc.Client: Retrying connect to server: ip-172-31-4-147.ap-south-1.compute.internal/172.31.4.147:43852. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)18/05/30 11:50:06 INFO ipc.Client: Retrying connect to server: ip-172-31-4-147.ap-south-1.compute.internal/172.31.4.147:43852. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)18/05/30 11:51:04 INFO mapreduce.Job: Task Id : attempt_1527499476017_12588_m_000009_1000, Status : FAILEDContainer killed on request. Exit code is 137Container exited with a non-zero exit code 137Killed by external signal18/05/30 11:51:04 INFO mapreduce.Job: Task Id : attempt_1527499476017_12588_m_000008_1000, Status : FAILEDContainer killed on request. Exit code is 137Container exited with a non-zero exit code 137Killed by external signal18/05/30 11:52:06 INFO mapreduce.Job:map 20% reduce 0%<br>
What does it say about the error? My services are up and running and there have been no unexpected exits either
How do I fix it?
05-30-2018 03:31 AM - edited 05-30-2018 03:32 AM
My jobs are running through oozie. Where does it say it's running out of RAM? There was not much load on the server when this job started.
I have been running into this error at times when there is less load on the server as well. How should I fix if it was running out of memory? @manuroman
05-30-2018 04:12 AM
First, you can post what is the memory assigned to oozie.launcher?
Second, you could increase memory in the property oozie.launcher.mapreduce.map.memory.mb and then try to run the sqoop job
Also, you could monitoring in the usage of memory when it is running.
05-30-2018 08:25 AM - edited 05-30-2018 08:41 AM
@manuroman It's 1 GB. Does it matter in this case?
Regarding, monitoring the usage of memory when it is running: How do I do that?
In CM, I see the following info and according to this, there is sufficient physical memory on all my hosts. Do I need to monitor it from somewhere else?
IP Roles Last HB . Load Average Disk usage Phyical Memory
|172.31.1.128||2 Role(s)||13.35s ago||7.26 7.36 7.81|
125.9 GiB / 1000 GiB
5.2 GiB / 62.5 GiB
|172.31.1.207||3 Role(s)||1.15s ago||18.16 18.98 20.50|
438.1 GiB / 2 TiB
12.7 GiB / 62.5 GiB
|172.31.10.74||3 Role(s)||3.59s ago||7.29 7.42 7.66|
431.1 GiB / 2 TiB
12.1 GiB / 62.5 GiB
|172.31.13.118||6 Role(s)||4.43s ago||17.26 17.94 18.42|
1.1 TiB / 1.4 TiB
10.7 GiB / 31 GiB
|172.31.4.147||2 Role(s)||12.8s ago||7.15 7.66 8.15|
125.9 GiB / 1000 GiB
4.9 GiB / 62.5 GiB
|172.31.4.192||14 Role(s)||5.95s ago||8.59 8.70 8.67|
2 TiB / 3.4 TiB
25.9 GiB / 62.5 GiB
|172.31.5.201||2 Role(s)||6.12s ago||18.62 19.43 21.28|
173.9 GiB / 2 TiB
6.8 GiB / 62.5 GiB
|172.31.6.221||2 Role(s)||14.8s ago||32.90 35.52 36.23|
125.8 GiB / 1000 GiB
5.5 GiB / 62.5 GiB
|172.31.6.44||8 Role(s)||12.94s ago||16.07 16.10 16.10|
917.6 GiB / 1000 GiB
6.4 GiB / 31 GiB
|172.31.6.58||4 Role(s)||6.76s ago||18.77 18.49 18.86|
17.2 GiB / 1000 GiB
3.9 GiB / 31 GiB
05-30-2018 08:50 AM
You have assigned only 1gb for oozie role, if the job need more memory this will be killed after 3 times.
Then try to assing more than 1gb, for example 4 or 6 gb since you have more available memory.
05-30-2018 09:24 AM - edited 05-30-2018 09:28 AM
I never knew that memory would be consumed from oozie memory. I hope you mean Java heap space that's allocated to oozie or is it something else were speaking of?
05-30-2018 09:31 AM
Also, one more thing, All my node managers result in unexpected exits if my Resource Manager is running on Server1.
What I did was installed it on another server2 with standby mode.
Now, if I delete resource manager from the node it was installed on(Server1), All my oozie jobs get killed with this error:
Job Tracker(server1:8032) is not whitelisted. Only server1:8032 or 8020 is. What's happening here? How do I fix this?