Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

cdh5.4 spark-sql gets ClassNotFoundException: SparkSQLCLIDriver

cdh5.4 spark-sql gets ClassNotFoundException: SparkSQLCLIDriver

Explorer

I recompiled cloudera spark cdh5-1.3.0_5.4.0 with 

-Phive-thirftserver -Pyarn

 and copied the new assembly jar to all the nodes. 

 

when I run spark-sql, I first get JAVA_HOME not set. I did an export manually, but still get the other error 

ClassNotFoundException: org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver

 

What is Cloudera's position on supporting spark-sql command line? is there a way to make it work?

6 REPLIES 6

Re: cdh5.4 spark-sql gets ClassNotFoundException: SparkSQLCLIDriver

Master Collaborator

I believe spark-sql just works, or it did for me last time I tried it. It sounds like something may be funny with how you are running this, if you find JAVA_HOME is not set. You should never have to rebuild things and shouldn't be modifying the CDH installation. If you've been doing that, that's probably part of the problem.

Re: cdh5.4 spark-sql gets ClassNotFoundException: SparkSQLCLIDriver

Explorer

no, the new assembly.jar did not cause the problem. these errors exists before I did the jar replacement, and still show up after I put back the original jar. 

 

JAVA_HOME is set from command line using export. so that complaint is not there any more.

 

Let's make sure we are talking about the same thing. I am referring to the executable at

/opt/cloudera/parcels/CDH-5.4.0-1.cdh5.4.0.p0.27/lib/spark/bin/spark-sql

Re: cdh5.4 spark-sql gets ClassNotFoundException: SparkSQLCLIDriver

Master Collaborator

I think that's right, since Spark currently only works with Hive 0.12 / 0.13. For this reason I think the Hive integration does not work, but that is not the same thing as Spark SQL in general, nor the same thing as Hive on Spark. You can 'enable' it by just including Hive jars on the classpath, and a lot of it should work but, not 100%

Re: cdh5.4 spark-sql gets ClassNotFoundException: SparkSQLCLIDriver

Explorer

You know, I imagine you have a CDH5.4 cluster readily available. It would really help if you could fire up spark-sql and see if it comes up.

 

If you have to include extra jars, I like to know which jars and how you included them using cloudera manager.

Re: cdh5.4 spark-sql gets ClassNotFoundException: SparkSQLCLIDriver

Master Collaborator

Hm, no you are correct. The spark-sql script itself does show this error in the VM. It's there, but not something you're intended to execute (it's not on the path), for this reason I assume. Spark SQL in general does function; you can find the sqlContext in spark-shell for example. The HiveContext et al won't work unless you add the local Hive jars to the app classpath. You're trying to run the Hive Thrift Server. You can try just building the Spark sql/hive-thriftserver module and including its jar on your classpath. You'd have to build against the latest it supports, which is only 0.13.1, and take your chances then versus Hive 1.1 jars locally, but it might work. That I have never tried.

Re: cdh5.4 spark-sql gets ClassNotFoundException: SparkSQLCLIDriver

Explorer

Thanks for trying. I really want to know the answer. 

 

As long as the bundled hive is not compatible, I don't think I want to continue trying to get spark-sql working. Even Spark Sql working with hive metastore requires rebuilding from source. I already tried building it many ways over many hours, each attempt ends in error. It is not for the faint of heart.