Support Questions

Find answers, ask questions, and share your expertise

job failing in livy - no logs

avatar
Explorer

 

I just brought up a new CDH 5 cluster and compiled and installed Livy.  I can run jobs using spark-submit and they run via yarn and spark normally. 

 

Jobs submitted through Livy create the SparkContext (accourding to Jupyter), I can assign things and run transformations, but the jobs die as soon as I try to execute an action. I get an error in jupyter that the SparkContext has been shutdown.  The job itself is registered in the Spark History server as having an Executuve Driver added, nothing else. 

 

There is no mention of the job in the list of Yarn applications.  I don't see anything telling in livy.log or the spark-history-server log.  Without any entry in Yarn applications, I am not sure where to look to see why it is dying. 

 

This all runs fine.

 

 

from pyspark.sql import Row
from pyspark.sql import SQLContext
from pyspark.sql.window import Window
import pyspark.sql.functions as func
sqlc = SQLContext(sc)

row1 = Row(name='willie', number=1)
row2 = Row(name='bob', number=1)
row3 = Row(name='bob', number=3)
row4 = Row(name='willie', number=6)
row5 = Row(name='willie', number=9)
row6 = Row(name='bob', number=12)
row7 = Row(name='willie', number=15)
row8 = Row(name='jon', number=16)
row9 = Row(name='jon', number=17)
df = sqlc.createDataFrame([row1, row2, row3, row4, row5, row6, row7, row8, row9 ])

This then dies with the following error.

df.count()

Any pointers on how to troubleshoot would be appreciated!

An error occurred while calling o68.count.
: java.lang.IllegalStateException: SparkContext has been shutdown
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1854)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1875)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1888)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1959)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166)
	at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53)
	at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2101)
	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1513)
	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1520)
	at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1530)
	at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1529)
	at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2114)
	at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1529)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
	at py4j.Gateway.invoke(Gateway.java:259)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:209)
	at java.lang.Thread.run(Thread.java:745)

Traceback (most recent call last):
  File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 269, in count
    return int(self._jdf.count())
  File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
    return f(*a, **kw)
  File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o68.count.
: java.lang.IllegalStateException: SparkContext has been shutdown
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1854)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1875)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1888)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1959)
	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166)
	at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514)
	at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53)
	at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2101)
	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1513)
	at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1520)
	at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1530)
	at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1529)
	at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2114)
	at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1529)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
	at py4j.Gateway.invoke(Gateway.java:259)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:209)
	at java.lang.Thread.run(Thread.java:745)

 

1 ACCEPTED SOLUTION

avatar
Explorer

 

For the next poor schlub that encouters this weird behavior, I figured out a workaround which also helps me to pinpoint the problem. 

 

It turns out that the problem was with SqlContext. I realized that my SparkContext could create and manipulate RDD's all day without problem. The SQlContext however would not allow me to work with dataframes without an error. 

 

I found that if I stopped my SparkContext, created a new one, and then created a new SqlContext from that, everything worked fine. This leads me to believe that there was something going on with the SparkContext that I was being passed from SparkMagic. 

 

I've updated to Spark2 now and I don't seem to be having any troubles that I have seen yet with the SparkSession so I doubt I will be digging more into this. 

 

 

View solution in original post

3 REPLIES 3

avatar
Explorer

I figured out part of this.  I needed to set 

 

 

livy.spark.master = yarn

 

With that set, the job does appear in yarn.  It is still dying prematurely when I run it through livy, and the yarn logs look happy, so I am not sure what is going on there. But at least that is something. 

 

avatar
Explorer

Wondering if anyone has any thoughts on this? I am stumped. Someone suggested it might be the driver running out of memory. I bosted driver mem to 4G without any change. Also still not able to find any logs that indicate the issue. I assume it must be the driver that is generating the error because the Yarn and Spark consider the process as incomplete until it times out. 

 

 

avatar
Explorer

 

For the next poor schlub that encouters this weird behavior, I figured out a workaround which also helps me to pinpoint the problem. 

 

It turns out that the problem was with SqlContext. I realized that my SparkContext could create and manipulate RDD's all day without problem. The SQlContext however would not allow me to work with dataframes without an error. 

 

I found that if I stopped my SparkContext, created a new one, and then created a new SqlContext from that, everything worked fine. This leads me to believe that there was something going on with the SparkContext that I was being passed from SparkMagic. 

 

I've updated to Spark2 now and I don't seem to be having any troubles that I have seen yet with the SparkSession so I doubt I will be digging more into this.