Member since
01-14-2017
17
Posts
0
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
10454 | 05-19-2017 10:54 AM | |
16510 | 05-17-2017 03:05 PM | |
1668 | 05-15-2017 09:35 AM |
05-19-2017
10:54 AM
For the next poor schlub that encouters this weird behavior, I figured out a workaround which also helps me to pinpoint the problem. It turns out that the problem was with SqlContext. I realized that my SparkContext could create and manipulate RDD's all day without problem. The SQlContext however would not allow me to work with dataframes without an error. I found that if I stopped my SparkContext, created a new one, and then created a new SqlContext from that, everything worked fine. This leads me to believe that there was something going on with the SparkContext that I was being passed from SparkMagic. I've updated to Spark2 now and I don't seem to be having any troubles that I have seen yet with the SparkSession so I doubt I will be digging more into this.
... View more
05-17-2017
03:05 PM
I was able to add it as a service after activating the parcel AND downloading the csd jar. I didn't realize that I needed both of these. I thought it was either or.
... View more
05-16-2017
02:08 PM
Wondering if anyone has any thoughts on this? I am stumped. Someone suggested it might be the driver running out of memory. I bosted driver mem to 4G without any change. Also still not able to find any logs that indicate the issue. I assume it must be the driver that is generating the error because the Yarn and Spark consider the process as incomplete until it times out.
... View more
05-15-2017
02:32 PM
Thanks Bill, I just built this cluster using CDH 5.11.0. I installed Spark1.6.0 thought the wizard along with Yarn, ZooKeeper, and HDFS. I verified that Spark 1.6.0 worked and later added Hive as well. I added the parcel configuration for Spark 2. I downloaded it, distributed it, and activated it. It appears as distributed and anctivated under parcels. I then expected to see it in the list of services that I could add under the cluster's "Add Service" option, but I don't. I turned off "Validate Parcel Relations" to see if that would cause it to appear, but it didn't.
... View more
05-15-2017
02:06 PM
Went there, activated it, but I still don't see it as a choice when I Add Service.
... View more
05-15-2017
01:27 PM
I figured out part of this. I needed to set livy.spark.master = yarn With that set, the job does appear in yarn. It is still dying prematurely when I run it through livy, and the yarn logs look happy, so I am not sure what is going on there. But at least that is something.
... View more
05-15-2017
09:35 AM
I figured out what was causing this. One of the repo sites that i had configured was not being let through my proxy. Once I opened up the proxy to that repo site, the error went away. --Willie
... View more
05-12-2017
01:23 PM
Hello,
I have a cluster that is not able to use its proxy (That's a separate post). In order to install SPARK2, I have attempted to follow the instructions here: https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
I put the SPARK2 parcel into the /opt/cloudera/csd directory. After restarting the scm-server, I was able to distribute and activate CSD. However, SPARK2 does not appear as an option under "Add a Service"
The only related errors I see is an error in the scm-server log indicating that it failed to load CSD from /opt/cloudera/csd because it has no jar extension. Although I think this is a spurious error because the package has already been distrivuted and activate, I did rename it with a jar extension. Upon restarting, I got an error idicating that there was a ZipException when it tried to uncompress it. The file does appear to be a valid gzipped tar archive.
Any ideas? Thanks!
... View more
Labels:
- Labels:
-
Apache Spark
-
Cloudera Manager
05-12-2017
09:17 AM
Hello, Running CDH 5.11.0. I have my proxy set in network settings, however I am getting this error whenever I try to check for new parcels. 2017-05-12 12:14:08,077 WARN ParcelUpdateService:com.cloudera.cmf.persist.ReadWriteDatabaseTaskCallable: Error while executing CmfEntityManager task
java.util.MissingFormatArgumentException: Format specifier 's'
at java.util.Formatter.format(Formatter.java:2487)
at java.util.Formatter.format(Formatter.java:2423)
at java.lang.String.format(String.java:2790)
at com.cloudera.parcel.components.ParcelDownloaderImpl.syncRemoteRepos(ParcelDownloaderImpl.java:359)
at com.cloudera.parcel.components.ParcelDownloaderImpl$1.run(ParcelDownloaderImpl.java:439)
at com.cloudera.parcel.components.ParcelDownloaderImpl$1.run(ParcelDownloaderImpl.java:434)
at com.cloudera.cmf.persist.ReadWriteDatabaseTaskCallable.call(ReadWriteDatabaseTaskCallable.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Cloudera Manager
05-11-2017
02:43 PM
I just brought up a new CDH 5 cluster and compiled and installed Livy. I can run jobs using spark-submit and they run via yarn and spark normally. Jobs submitted through Livy create the SparkContext (accourding to Jupyter), I can assign things and run transformations, but the jobs die as soon as I try to execute an action. I get an error in jupyter that the SparkContext has been shutdown. The job itself is registered in the Spark History server as having an Executuve Driver added, nothing else. There is no mention of the job in the list of Yarn applications. I don't see anything telling in livy.log or the spark-history-server log. Without any entry in Yarn applications, I am not sure where to look to see why it is dying. This all runs fine. from pyspark.sql import Row
from pyspark.sql import SQLContext
from pyspark.sql.window import Window
import pyspark.sql.functions as func
sqlc = SQLContext(sc)
row1 = Row(name='willie', number=1)
row2 = Row(name='bob', number=1)
row3 = Row(name='bob', number=3)
row4 = Row(name='willie', number=6)
row5 = Row(name='willie', number=9)
row6 = Row(name='bob', number=12)
row7 = Row(name='willie', number=15)
row8 = Row(name='jon', number=16)
row9 = Row(name='jon', number=17)
df = sqlc.createDataFrame([row1, row2, row3, row4, row5, row6, row7, row8, row9 ]) This then dies with the following error. df.count() Any pointers on how to troubleshoot would be appreciated! An error occurred while calling o68.count.
: java.lang.IllegalStateException: SparkContext has been shutdown
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1854)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1875)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1888)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1959)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166)
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2101)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1513)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1520)
at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1530)
at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1529)
at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2114)
at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1529)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 269, in count
return int(self._jdf.count())
File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/opt/cloudera/parcels/CDH-5.11.0-1.cdh5.11.0.p0.34/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o68.count.
: java.lang.IllegalStateException: SparkContext has been shutdown
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1854)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1875)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1888)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1959)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:166)
at org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514)
at org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1514)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:53)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2101)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1513)
at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1520)
at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1530)
at org.apache.spark.sql.DataFrame$$anonfun$count$1.apply(DataFrame.scala:1529)
at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2114)
at org.apache.spark.sql.DataFrame.count(DataFrame.scala:1529)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Apache Spark