Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Multinomial NB on Spark on YARN

Highlighted

Multinomial NB on Spark on YARN

New Contributor

Trying to build a ml model using Pyspark on spark on YARN cluster mode

Having 120 GB of RAM, with ample amount of storage.

 

Yet on nb.fit(df) line of training code throws OutOfMemoryException, have tried all the memory tuning parameter in YARN, with multiple options of spawning executor(Increasing executor/ decreasing executor)

But, at the end fails at OOM exception.

Tried decreasing the features/ decreasing the data size, yet it ends up with the memory issue.

Only successful model creation was with 1000 records .

 

Have tried so much of things but there is no result, please help.

1 REPLY 1

Re: Multinomial NB on Spark on YARN

Community Manager

@sridar1992 I'm not an expert but I did find this community article that may be of interest if you haven't read it yet. 

 

Spark on YARN - Executor Resource Allocation Optimization



Cy Jervis, Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:
Community Guidelines
How to use the forum
Don't have an account?
Coming from Hortonworks? Activate your account here