Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SparkThrift with hive: errors connecting via standalone java client

SparkThrift with hive: errors connecting via standalone java client


Hi ,


I have a spark thrift server up and running ( with yarn and hive support ) .

Some background: 

1) Hive is  up and running,  table X is available in the default database. 

2) i can see the tables thru beeline as well

3) Hive meta store is backed by an  Oracle database


I am   using JavaHiveContext created using the below code in my java client :


JavaSparkContext sc = new JavaSparkContext("yarn-client", "default");  -- or yarn-cluster 
JavaHiveContext sqlContext = new JavaHiveContext(sc);
JavaSchemaRDD resultsRDD = sqlContext.sql("select col1 from X ");


Question1:  What jdbc jars should be in the classpath of the sparkthrift server and the java client ? what oracle specific jars should be in the classpath? by default my program looks for org.apache.derby.jdbc.EmbeddedDriver.. 


I  add the derby**.jar  to see what other issues setup i may encounter  and i ended up with these : 

a) When using the yarn-client as master, the program hangs trying to connect to, how can this be resolved to the right address ? Any additional configs to be specifcied ?

b) When using the yarn-cluster as master, the program terminates with message:default.X table not found .

    this is probably due to the fact its looking for this table in derby? 


Question2: Is the above snippet ok in terms of creating the contexts


it appears that the above  are primarily related to classpath issues, maybe am wrong..  


appreciate any help in resolving this.