Created on 05-11-2017 02:43 PM - edited 09-16-2022 04:35 AM
I just brought up a new CDH 5 cluster and compiled and installed Livy. I can run jobs using spark-submit and they run via yarn and spark normally.
Jobs submitted through Livy create the SparkContext (accourding to Jupyter), I can assign things and run transformations, but the jobs die as soon as I try to execute an action. I get an error in jupyter that the SparkContext has been shutdown. The job itself is registered in the Spark History server as having an Executuve Driver added, nothing else.
There is no mention of the job in the list of Yarn applications. I don't see anything telling in livy.log or the spark-history-server log. Without any entry in Yarn applications, I am not sure where to look to see why it is dying.
This all runs fine.
from pyspark.sql import Row from pyspark.sql import SQLContext from pyspark.sql.window import Window import pyspark.sql.functions as func sqlc = SQLContext(sc) row1 = Row(name='willie', number=1) row2 = Row(name='bob', number=1) row3 = Row(name='bob', number=3) row4 = Row(name='willie', number=6) row5 = Row(name='willie', number=9) row6 = Row(name='bob', number=12) row7 = Row(name='willie', number=15) row8 = Row(name='jon', number=16) row9 = Row(name='jon', number=17) df = sqlc.createDataFrame([row1, row2, row3, row4, row5, row6, row7, row8, row9 ])
This then dies with the following error.
df.count()
Any pointers on how to troubleshoot would be appreciated!
An error occurred while calling o68.count. : java.lang.IllegalStateException: SparkContext has been shutdown at org.apache.spark.SparkContext.runJob(SparkContext.scala:1854) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1875) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1888) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1959) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.collect(RDD.scala:926) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2101) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1513) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1520) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1530) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1529) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2114) at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1529) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 269, in count return int(self._jdf.count()) File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco return f(*a, **kw) File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling o68.count. : java.lang.IllegalStateException: SparkContext has been shutdown at org.apache.spark.SparkContext.runJob(SparkContext.scala:1854) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1875) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1888) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1959) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.collect(RDD.scala:926) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2101) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1513) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1520) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1530) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1529) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2114) at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1529) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745)
Created 05-19-2017 10:54 AM
For the next poor schlub that encouters this weird behavior, I figured out a workaround which also helps me to pinpoint the problem.
It turns out that the problem was with SqlContext. I realized that my SparkContext could create and manipulate RDD's all day without problem. The SQlContext however would not allow me to work with dataframes without an error.
I found that if I stopped my SparkContext, created a new one, and then created a new SqlContext from that, everything worked fine. This leads me to believe that there was something going on with the SparkContext that I was being passed from SparkMagic.
I've updated to Spark2 now and I don't seem to be having any troubles that I have seen yet with the SparkSession so I doubt I will be digging more into this.
Created 05-15-2017 01:27 PM
I figured out part of this. I needed to set
livy.spark.master = yarn
With that set, the job does appear in yarn. It is still dying prematurely when I run it through livy, and the yarn logs look happy, so I am not sure what is going on there. But at least that is something.
Created 05-16-2017 02:08 PM
Wondering if anyone has any thoughts on this? I am stumped. Someone suggested it might be the driver running out of memory. I bosted driver mem to 4G without any change. Also still not able to find any logs that indicate the issue. I assume it must be the driver that is generating the error because the Yarn and Spark consider the process as incomplete until it times out.
Created 05-19-2017 10:54 AM
For the next poor schlub that encouters this weird behavior, I figured out a workaround which also helps me to pinpoint the problem.
It turns out that the problem was with SqlContext. I realized that my SparkContext could create and manipulate RDD's all day without problem. The SQlContext however would not allow me to work with dataframes without an error.
I found that if I stopped my SparkContext, created a new one, and then created a new SqlContext from that, everything worked fine. This leads me to believe that there was something going on with the SparkContext that I was being passed from SparkMagic.
I've updated to Spark2 now and I don't seem to be having any troubles that I have seen yet with the SparkSession so I doubt I will be digging more into this.