Support Questions
Find answers, ask questions, and share your expertise

Multiple Livy Sessions when running PySpark via Jupyter

Multiple Livy Sessions when running PySpark via Jupyter

New Contributor

We are having problems with parallel use of PySpark scripts by Jypyter. We currently have a team of 3 people who need to use Jupyter to encode in parallel, (different scripts). The first person to connect, creates a Spark context successfully, via Livy session. The following people can create other contexts, but the server is slow and the following message appears:

16607-untitled.png

When checking in the YARN we only have 1 Livy session in progress, the following do not appear. Our cluster has 128 GB of RAM, which should be sufficient for at least 2 parallel sessions, which is not currently possible. Currently we are all using the same user to access the Ambari.

16608-untitled2.png

1) How can we parallelize the access of 3 people at the same time, to program in Jupyter?

2) Is 128 GB RAM enough for this job?

3) If we have 1 access user for each person, can this work be done in parallel?

Thanks in advance