12-04-2018 02:49 PM
I have a Oozie job which starts a Oozie shell action,the shell action starts a spark application (spark2-submit). I am mostly doing spark sql. The jobs runs for a while and suddenly hangs. It starts the spark application all over again.
I ran the same spark application in CDSW and it ran fine without issues.
The same is happening with other Oozie job . The only common thing between these two jobs is that they run longer, around 2hrs.
Any help will be helpful.
12-04-2018 02:58 PM
Do you see some error or info related to timeout or anything indicating why the Application failed and had to be restarted. You can gather the yarn log for the application and check at the Application Master section to see what was the reason for failure. May be the Spark configuration being used at CDSW are differnt than the one used by spark submit.
12-04-2018 03:07 PM
Thanks for reply. I tried to see logs from SparkUI but it was blank. It will be great if you can guide me on checking yarn logs.
I tried below command and it gave me blank too
yarn logs -applicationId application_1543621459332_8094 > spark_app.log
12-05-2018 12:18 PM
We see the error as below.
"The container showed error ...Container [pid=68208,containerID=container_e59_1543621459332_7731_01_000002] is running beyond physical memory limits. Current usage: 4.0 GB of 4 GB physical memory used; 26.5 GB of 8.4 GB virtual memory used. Killing container..."
When I run in CDSW or through Oozie both have same memory and configurations for my spark application(executor memory,core,driver memory,memory overhead etc). From CDSW it never failed but when I run from Oozie Shell Action (caloing spark2-submit) it randomly fails.
Trying to understand what is different in Oozie, how do I set this memory limit.
12-06-2018 09:39 AM
This error is indicatig tha the containre size in yarn is set to 4.00 Gb and your spark application needs more memory to run.
As a test you can increase the continer size in yarn configuration to say 6 Gb or 8Gb and see if the application succeeds. ( If using Cloudera manager, you will se this in CM > yarn > configuration> Container Memory
12-07-2018 08:24 AM - edited 12-07-2018 08:25 AM
I applied this property and increased the limit to 6 GB. It still fails with exact same error message.
12-07-2018 09:29 AM
That means the spark submit is asking for container size of 4 Gb. The --executor-memory must be getting set to 4g.
Can you check the saprk command being used and set the --executor-memory and --driver-memory to 6g.
12-07-2018 12:12 PM
sorry forgot to mention.. I have been using executor memory of 14GB and driver memory of 10GB. None of my tasks spill memory to disk. this is so strange and its shaking my fundamental understanding of spark.
I have memory overhead of 3G.
Again, the same setting in CDSW are used but it never failed from there. Its when I run the job in Oozie it fails. It restarts on its own and that one completes without any failures.
When would spark use physical memory and virtual memory?
12-07-2018 02:38 PM
The oozie mapper was running out of 4GB memory. I changed that to 8GB. Now the job ran fine without restarts.