Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Multinomial NB on Spark on YARN

New Contributor

Trying to build a ml model using Pyspark on spark on YARN cluster mode

Having 120 GB of RAM, with ample amount of storage.

 

Yet on nb.fit(df) line of training code throws OutOfMemoryException, have tried all the memory tuning parameter in YARN, with multiple options of spawning executor(Increasing executor/ decreasing executor)

But, at the end fails at OOM exception.

Tried decreasing the features/ decreasing the data size, yet it ends up with the memory issue.

Only successful model creation was with 1000 records .

 

Have tried so much of things but there is no result, please help.

1 REPLY 1

Community Manager

@sridar1992 I'm not an expert but I did find this community article that may be of interest if you haven't read it yet. 

 

Spark on YARN - Executor Resource Allocation Optimization


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.