About wkupersa

wkupersa · ‎05-19-2017

For the next poor schlub that encouters this weird behavior, I figured out a workaround which also helps me to pinpoint the problem. It turns out that the problem was with SqlContext. I realized that my SparkContext could create and manipulate RDD's all day without problem. The SQlContext however would not allow me to work with dataframes without an error. I found that if I stopped my SparkContext, created a new one, and then created a new SqlContext from that, everything worked fine. This leads me to believe that there was something going on with the SparkContext that I was being passed from SparkMagic. I've updated to Spark2 now and I don't seem to be having any troubles that I have seen yet with the SparkSession so I doubt I will be digging more into this.

wkupersa · ‎05-17-2017

I was able to add it as a service after activating the parcel AND downloading the csd jar. I didn't realize that I needed both of these. I thought it was either or.

wkupersa · ‎05-16-2017

Wondering if anyone has any thoughts on this? I am stumped. Someone suggested it might be the driver running out of memory. I bosted driver mem to 4G without any change. Also still not able to find any logs that indicate the issue. I assume it must be the driver that is generating the error because the Yarn and Spark consider the process as incomplete until it times out.

wkupersa · ‎05-15-2017

Thanks Bill, I just built this cluster using CDH 5.11.0. I installed Spark1.6.0 thought the wizard along with Yarn, ZooKeeper, and HDFS. I verified that Spark 1.6.0 worked and later added Hive as well. I added the parcel configuration for Spark 2. I downloaded it, distributed it, and activated it. It appears as distributed and anctivated under parcels. I then expected to see it in the list of services that I could add under the cluster's "Add Service" option, but I don't. I turned off "Validate Parcel Relations" to see if that would cause it to appear, but it didn't.

wkupersa · ‎05-15-2017

Went there, activated it, but I still don't see it as a choice when I Add Service.

wkupersa · ‎05-15-2017

I figured out part of this. I needed to set livy.spark.master = yarn With that set, the job does appear in yarn. It is still dying prematurely when I run it through livy, and the yarn logs look happy, so I am not sure what is going on there. But at least that is something.

wkupersa · ‎05-15-2017

I figured out what was causing this. One of the repo sites that i had configured was not being let through my proxy. Once I opened up the proxy to that repo site, the error went away. --Willie

wkupersa · ‎05-12-2017

Hello, I have a cluster that is not able to use its proxy (That's a separate post). In order to install SPARK2, I have attempted to follow the instructions here: https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html I put the SPARK2 parcel into the /opt/cloudera/csd directory. After restarting the scm-server, I was able to distribute and activate CSD. However, SPARK2 does not appear as an option under "Add a Service" The only related errors I see is an error in the scm-server log indicating that it failed to load CSD from /opt/cloudera/csd because it has no jar extension. Although I think this is a spurious error because the package has already been distrivuted and activate, I did rename it with a jar extension. Upon restarting, I got an error idicating that there was a ZipException when it tried to uncompress it. The file does appear to be a valid gzipped tar archive. Any ideas? Thanks!

wkupersa · ‎05-12-2017

Hello, Running CDH 5.11.0. I have my proxy set in network settings, however I am getting this error whenever I try to check for new parcels. 2017-05-12 12:14:08,077 WARN ParcelUpdateService:com.cloudera.cmf.persist.ReadWriteDatabaseTaskCallable: Error while executing CmfEntityManager task java.util.MissingFormatArgumentException: Format specifier 's' at java.util.Formatter.format(Formatter.java:2487) at java.util.Formatter.format(Formatter.java:2423) at java.lang.String.format(String.java:2790) at com.cloudera.parcel.components.ParcelDownloaderImpl.syncRemoteRepos(ParcelDownloaderImpl.java:359) at com.cloudera.parcel.components.ParcelDownloaderImpl$1.run(ParcelDownloaderImpl.java:439) at com.cloudera.parcel.components.ParcelDownloaderImpl$1.run(ParcelDownloaderImpl.java:434) at com.cloudera.cmf.persist.ReadWriteDatabaseTaskCallable.call(ReadWriteDatabaseTaskCallable.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

wkupersa · ‎05-11-2017

I just brought up a new CDH 5 cluster and compiled and installed Livy. I can run jobs using spark-submit and they run via yarn and spark normally. Jobs submitted through Livy create the SparkContext (accourding to Jupyter), I can assign things and run transformations, but the jobs die as soon as I try to execute an action. I get an error in jupyter that the SparkContext has been shutdown. The job itself is registered in the Spark History server as having an Executuve Driver added, nothing else. There is no mention of the job in the list of Yarn applications. I don't see anything telling in livy.log or the spark-history-server log. Without any entry in Yarn applications, I am not sure where to look to see why it is dying. This all runs fine. from pyspark.sql import Row from pyspark.sql import SQLContext from pyspark.sql.window import Window import pyspark.sql.functions as func sqlc = SQLContext(sc) row1 = Row(name='willie', number=1) row2 = Row(name='bob', number=1) row3 = Row(name='bob', number=3) row4 = Row(name='willie', number=6) row5 = Row(name='willie', number=9) row6 = Row(name='bob', number=12) row7 = Row(name='willie', number=15) row8 = Row(name='jon', number=16) row9 = Row(name='jon', number=17) df = sqlc.createDataFrame([row1, row2, row3, row4, row5, row6, row7, row8, row9 ]) This then dies with the following error. df.count() Any pointers on how to troubleshoot would be appreciated! An error occurred while calling o68.count. : java.lang.IllegalStateException: SparkContext has been shutdown at org.apache.spark.SparkContext.runJob(SparkContext.scala:1854) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1875) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1888) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1959) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.collect(RDD.scala:926) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2101) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1513) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1520) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1530) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1529) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2114) at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1529) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 269, in count return int(self._jdf.count()) File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco return f(*a, **kw) File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling o68.count. : java.lang.IllegalStateException: SparkContext has been shutdown at org.apache.spark.SparkContext.runJob(SparkContext.scala:1854) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1875) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1888) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1959) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.collect(RDD.scala:926) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166) at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514) at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2101) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1513) at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1520) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1530) at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1529) at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2114) at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1529) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745)

Online	Offline
Last Visited	‎03-30-2018 02:54 PM

Member Since	‎01-14-2017 08:20 PM
Last Visited	‎03-30-2018 02:54 PM
Posts	17

Cloudera Community

Re: job failing in livy - no logs

Re: Installing Spark2

Re: Error Getting parcel info

Re: job failing in livy - no logs

Re: Installing Spark2

Re: job failing in livy - no logs

Re: Installing Spark2

Re: Installing Spark2

Re: job failing in livy - no logs

Re: Error Getting parcel info

Not able to load CSD

Error Getting parcel info

job failing in livy - no logs