Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Container is running beyond memory limits - RECEIVED SIGNAL 15: SIGTERM
Labels:
Explorer
Created on ‎04-16-2019 01:03 PM - edited ‎09-16-2022 07:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All ,
I implemented model prediction in oozie workflow and i got error "Container is running beyond memory limits" on step 3 i.e. model1.predict_proba. Table1 has 27 Million records. It run fine on jyupiter notebook but i got this error on oozie. Can someone please help.
d1 = sqlContext.sql("SELECT * FROM table1").toPandas()
xyz= d1.drop(['abc'], axis = 1)
modelprob = model1.predict_proba(xyz)[:,1]
Error : Yarn Logs
Application application_1547693435775_8741566 failed 2 times due to AM Container for appattempt_1547693435775_8741566_000002 exited with exitCode: -104
For more detailed output, check application tracking page:https://xyz
Diagnostics: Container [pid=224941,containerID=container_e167_1547693435775_8741566_02_000002] is running beyond physical memory limits. Current usage: 121.2 GB of 121 GB physical memory used; 226.9 GB of 254.1 GB virtual memory used. Killing container.
2019-04-15 22:43:36,231 [dispatcher-event-loop-10] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_5_piece0 on xyz.corp.intranet:34252 in memory (size: 5.6 KB, free: 6.2 GB)
2019-04-15 22:43:36,231 [dispatcher-event-loop-35] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_5_piece0 on xyz1.corp.intranet:38363 in memory (size: 5.6 KB, free: 6.2 GB)
2019-04-15 22:43:36,242 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned accumulator 4
2019-04-15 22:43:36,245 [dispatcher-event-loop-51] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz3 in memory (size: 53.5 KB, free: 52.8 GB)
2019-04-15 22:43:36,245 [dispatcher-event-loop-51] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz4.corp.intranet:46309 in memory (size: 53.5 KB, free: 6.2 GB)
2019-04-15 22:43:36,248 [dispatcher-event-loop-9] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz5.corp.intranet:44850 in memory (size: 53.5 KB, free: 6.2 GB)
2019-04-15 22:45:48,103 [SIGTERM handler] INFO org.apache.spark.deploy.yarn.ApplicationMaster - Final app status: FAILED, exitCode: 16
2019-04-15 22:45:48,106 [SIGTERM handler] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - RECEIVED SIGNAL 15: SIGTERM
2019-04-15 22:45:48,124 [Thread-5] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook
2019-04-15 22:43:36,231 [dispatcher-event-loop-35] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_5_piece0 on xyz1.corp.intranet:38363 in memory (size: 5.6 KB, free: 6.2 GB)
2019-04-15 22:43:36,242 [Spark Context Cleaner] INFO org.apache.spark.ContextCleaner - Cleaned accumulator 4
2019-04-15 22:43:36,245 [dispatcher-event-loop-51] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz3 in memory (size: 53.5 KB, free: 52.8 GB)
2019-04-15 22:43:36,245 [dispatcher-event-loop-51] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz4.corp.intranet:46309 in memory (size: 53.5 KB, free: 6.2 GB)
2019-04-15 22:43:36,248 [dispatcher-event-loop-9] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_2_piece0 on xyz5.corp.intranet:44850 in memory (size: 53.5 KB, free: 6.2 GB)
2019-04-15 22:45:48,103 [SIGTERM handler] INFO org.apache.spark.deploy.yarn.ApplicationMaster - Final app status: FAILED, exitCode: 16
2019-04-15 22:45:48,106 [SIGTERM handler] ERROR org.apache.spark.deploy.yarn.ApplicationMaster - RECEIVED SIGNAL 15: SIGTERM
2019-04-15 22:45:48,124 [Thread-5] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook
below are sparkconf parameters
sconf = SparkConf().setAppName("xyz model").set("spark.driver.memory", "8g").set('spark.executor.memory', '12g').set("spark.yarn.am.memory", "8g").set('spark.dynamicAllocation.enabled', 'true').set('spark.dynamicAllocation.minExecutors', 20').set('spark.dynamicAllocation.maxExecutors', '60').set("spark.shuffle.service.enabled", "true").set('spark.kryoserializer.buffer.max.mb', '2047').set("spark.shuffle.blockTransferService", "nio").set("spark.driver.maxResultSize", "4g").set('spark.rpc.message.maxSize', '330').setMaster("yarn-cluster")
sc = SparkContext(conf=sconf)
sc = SparkContext(conf=sconf)
below are sprkopts parameters :
sparkopts=--executor-memory 115g --num-executors 60 --driver-memory 110g --executor-cores 16 --driver-cores 2 --conf "spark.dynamicAllocation.enabled=true" --conf "spark.kryoserializer.buffer.max=2047m" --conf "spark.driver.maxResultSize=4096m" --conf spark.yarn.executor.memoryOverhead=8000 --conf "spark.network.timeout=10000000" --conf "spark.executor.extraJavaOptions=-XX:+UseCompressedOops -XX:PermSize=2048M -XX:MaxPermSize=2048M -XX:+UseG1GC" --conf "spark.broadcast.compress=true" --conf "spark.broadcast.blockSize=128m" --conf "spark.serializer.objectStreamReset=2" --conf spark.executorEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python --files ${xyz}/hive-site.xml --files ${xyz}/yarn-site.xml
1 REPLY 1
Super Guru
Created ‎05-11-2019 06:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It looks like that you running spark in cluster mode, and your ApplicationMaster is running OOM.
In cluster mode, the Driver is running inside the AM, I can see that you have Driver of 110G and executor memory of 12GB. Have you tried to increase both of them to see if it can help? How much I do not know, but maybe slowly increase to and keep trying.
However, the driver memory of 110GB seems to be a lot, am wondering what kind of dataset is this Spark job processing? How large is the volume?
Cheers
Eric
It looks like that you running spark in cluster mode, and your ApplicationMaster is running OOM.
In cluster mode, the Driver is running inside the AM, I can see that you have Driver of 110G and executor memory of 12GB. Have you tried to increase both of them to see if it can help? How much I do not know, but maybe slowly increase to and keep trying.
However, the driver memory of 110GB seems to be a lot, am wondering what kind of dataset is this Spark job processing? How large is the volume?
Cheers
Eric
