Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Multiple Spark Sessions From One CDSW Session


Multiple Spark Sessions From One CDSW Session

New Contributor



We noticed that CDSW does not seem to support starting multiple Spark sessions from one single CDSW session. This can be reproduced in multiple ways on CDSW 1.7.2 with base image version 10:

  • Start a CDSW session with the "Workbench - Scala" editor. This starts a Spark session automatically in the background via Apache Toree kernel.
    Open up the terminal built into CDSW and try to use spark-submit or spark-shell (with the default master setting yarn)
  • Start a CDSW session with the "Jupyter" editor. Open a notebook and start a Spark session in it (can be any type: Python, Scala/Toree, IRkernel, ...). Open another notebook and try to start a Spark session as well.

The error that is logged in YARN for the applications that correspond to the sessions that are trying to be run in parallel are:

ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
caused by: Failed to connect to /[cdsw-master-ip]:27560
caused by:$AnnotatedConnectException: connection refused: /[cdsw-master-ip]:27560
caused by: Connection refused


The error leads to the YARN application not being able to start and therefore for the Spark session not being able to be created.


Is this a limitation by design, or can it be avoided by proper custom configurations?


Side note: What we tested as well is that starting multiple Spark sessions with the same user, but from separate CDSW sessions works without a problem. So this is not a limitation that our cluster does not allow running multiple Sessions with the same user.


Thanks in advance!

Don't have an account?
Coming from Hortonworks? Activate your account here