Created on 05-30-2018 12:12 AM - edited 09-16-2022 06:16 AM
I have been running into 137 exit container code and this is what I got from one of my sqoop job logs:
18/05/30 11:49:45 INFO mapreduce.Job: Running job: job_1527499476017_1258818/05/30 11:49:52 INFO mapreduce.Job: Job job_1527499476017_12588 running in uber mode : false18/05/30 11:49:52 INFO mapreduce.Job:map 0% reduce 0%18/05/30 11:50:04 INFO ipc.Client: Retrying connect to server: ip-172-31-4-147.ap-south-1.compute.internal/172.31.4.147:43852. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)18/05/30 11:50:05 INFO ipc.Client: Retrying connect to server: ip-172-31-4-147.ap-south-1.compute.internal/172.31.4.147:43852. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)18/05/30 11:50:06 INFO ipc.Client: Retrying connect to server: ip-172-31-4-147.ap-south-1.compute.internal/172.31.4.147:43852. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)18/05/30 11:51:04 INFO mapreduce.Job: Task Id : attempt_1527499476017_12588_m_000009_1000, Status : FAILEDContainer killed on request. Exit code is 137Container exited with a non-zero exit code 137Killed by external signal18/05/30 11:51:04 INFO mapreduce.Job: Task Id : attempt_1527499476017_12588_m_000008_1000, Status : FAILEDContainer killed on request. Exit code is 137Container exited with a non-zero exit code 137Killed by external signal18/05/30 11:52:06 INFO mapreduce.Job:map 20% reduce 0%<br>
What does it say about the error? My services are up and running and there have been no unexpected exits either
How do I fix it?
Created 05-30-2018 03:24 AM
Hi @sim6,
When you run your sqoop job, you have enought RAM memory??
It seems to try 2 times but not have Ram to allocate the job.
Regards,
Manu.
Created on 05-30-2018 03:31 AM - edited 05-30-2018 03:32 AM
My jobs are running through oozie. Where does it say it's running out of RAM? There was not much load on the server when this job started.
I have been running into this error at times when there is less load on the server as well. How should I fix if it was running out of memory? @manuroman
Created 05-30-2018 04:12 AM
Hi,
First, you can post what is the memory assigned to oozie.launcher?
Second, you could increase memory in the property oozie.launcher.mapreduce.map.memory.mb and then try to run the sqoop job
Also, you could monitoring in the usage of memory when it is running.
Regards,
Manu.
Created on 05-30-2018 08:25 AM - edited 05-30-2018 08:41 AM
@manuroman It's 1 GB. Does it matter in this case?
Regarding, monitoring the usage of memory when it is running: How do I do that?
In CM, I see the following info and according to this, there is sufficient physical memory on all my hosts. Do I need to monitor it from somewhere else?
IP Roles Last HB . Load Average Disk usage Phyical Memory
172.31.1.128 | 2 Role(s) | 13.35s ago | 7.26 7.36 7.81 | 125.9 GiB / 1000 GiB | 5.2 GiB / 62.5 GiB | |||
172.31.1.207 | 3 Role(s) | 1.15s ago | 18.16 18.98 20.50 | 438.1 GiB / 2 TiB | 12.7 GiB / 62.5 GiB | |||
172.31.10.74 | 3 Role(s) | 3.59s ago | 7.29 7.42 7.66 | 431.1 GiB / 2 TiB | 12.1 GiB / 62.5 GiB | |||
172.31.13.118 | 6 Role(s) | 4.43s ago | 17.26 17.94 18.42 | 1.1 TiB / 1.4 TiB | 10.7 GiB / 31 GiB | |||
172.31.4.147 | 2 Role(s) | 12.8s ago | 7.15 7.66 8.15 | 125.9 GiB / 1000 GiB | 4.9 GiB / 62.5 GiB | |||
172.31.4.192 | 14 Role(s) | 5.95s ago | 8.59 8.70 8.67 | 2 TiB / 3.4 TiB | 25.9 GiB / 62.5 GiB | |||
172.31.5.201 | 2 Role(s) | 6.12s ago | 18.62 19.43 21.28 | 173.9 GiB / 2 TiB | 6.8 GiB / 62.5 GiB | |||
172.31.6.221 | 2 Role(s) | 14.8s ago | 32.90 35.52 36.23 | 125.8 GiB / 1000 GiB | 5.5 GiB / 62.5 GiB | |||
172.31.6.44 | 8 Role(s) | 12.94s ago | 16.07 16.10 16.10 | 917.6 GiB / 1000 GiB | 6.4 GiB / 31 GiB | |||
172.31.6.58 | 4 Role(s) | 6.76s ago | 18.77 18.49 18.86 | 17.2 GiB / 1000 GiB | 3.9 GiB / 31 GiB |
Created 05-30-2018 08:50 AM
Well @sim6,
You have assigned only 1gb for oozie role, if the job need more memory this will be killed after 3 times.
Then try to assing more than 1gb, for example 4 or 6 gb since you have more available memory.
Regards,
Manu.
Created on 05-30-2018 09:24 AM - edited 05-30-2018 09:28 AM
I never knew that memory would be consumed from oozie memory. I hope you mean Java heap space that's allocated to oozie or is it something else were speaking of?
Created 05-30-2018 09:31 AM
Also, one more thing, All my node managers result in unexpected exits if my Resource Manager is running on Server1.
What I did was installed it on another server2 with standby mode.
Now, if I delete resource manager from the node it was installed on(Server1), All my oozie jobs get killed with this error:
Job Tracker(server1:8032) is not whitelisted. Only server1:8032 or 8020 is. What's happening here? How do I fix this?