Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

PySpark - Error initializing SparkContext

avatar
New Contributor

Hi,

 

I have run into the below error while launching pyspark. Would appreciate any help on this.

 

-sh-4.1$ pyspark
Python 2.6.6 (r266:84292, Aug 18 2016, 08:36:59)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
18/11/12 11:20:25 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1537380011359_1185346 to YARN : Application rejected by queue placement policy
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:257)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:148)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:157)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:542)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:214)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:744)
18/11/12 11:20:26 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
18/11/12 11:20:26 ERROR util.Utils: Uncaught exception in thread Thread-2
java.lang.NullPointerException
        at org.apache.spark.network.shuffle.ExternalShuffleClient.close(ExternalShuffleClient.java:152)
        at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1264)
        at org.apache.spark.SparkEnv.stop(SparkEnv.scala:96)
        at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1768)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1230)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1767)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:614)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:214)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:744)
Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p2671.2910/lib/spark/python/pyspark/shell.py", line 43, in <module>
    sc = SparkContext(pyFiles=add_files)
  File "/opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p2671.2910/lib/spark/python/pyspark/context.py", line 115, in __init__
    conf, jsc, profiler_cls)
  File "/opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p2671.2910/lib/spark/python/pyspark/context.py", line 172, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p2671.2910/lib/spark/python/pyspark/context.py", line 235, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p2671.2910/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 1064, in __call__
  File "/opt/cloudera/parcels/CDH-5.9.1-1.cdh5.9.1.p2671.2910/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1537380011359_1185346 to YARN : Application rejected by queue placement policy
        at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:257)
        at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:148)
        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:157)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:542)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:214)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:744)

1 REPLY 1

avatar
Cloudera Employee

This line in the console output indicates the job was rejected by YARN:
18/11/12 11:20:25 ERROR spark.SparkContext: Error initializing SparkContext.

org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1537380011359_1185346 to YARN : Application rejected by queue placement policy

 

YARN determines which resource pool to place jobs in via placement rules in the scheduler configuration (see https://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-bas...).

 

It might be worth checking whether you have a "<rule name="reject"/>" rule in your scheduler configuration and if so, why the user's pyspark job reached this rule instead of one of the previous rules to place the job in the appropriate queue.