Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SparkThrift with hive: errors connecting via standalone java client

SparkThrift with hive: errors connecting via standalone java client

Explorer

Hi ,

 

I have a spark thrift server up and running ( with yarn and hive support ) .

Some background: 

1) Hive is  up and running,  table X is available in the default database. 

2) i can see the tables thru beeline as well

3) Hive meta store is backed by an  Oracle database

 

I am   using JavaHiveContext created using the below code in my java client :

 

JavaSparkContext sc = new JavaSparkContext("yarn-client", "default");  -- or yarn-cluster 
JavaHiveContext sqlContext = new JavaHiveContext(sc);
JavaSchemaRDD resultsRDD = sqlContext.sql("select col1 from X ");

 

Question1:  What jdbc jars should be in the classpath of the sparkthrift server and the java client ? what oracle specific jars should be in the classpath? by default my program looks for org.apache.derby.jdbc.EmbeddedDriver.. 

 

I  add the derby**.jar  to see what other issues setup i may encounter  and i ended up with these : 

a) When using the yarn-client as master, the program hangs trying to connect to 0.0.0.0:8032, how can this be resolved to the right address ? Any additional configs to be specifcied ?

b) When using the yarn-cluster as master, the program terminates with message:default.X table not found .

    this is probably due to the fact its looking for this table in derby? 

 

Question2: Is the above snippet ok in terms of creating the contexts

 

it appears that the above  are primarily related to classpath issues, maybe am wrong..  

 

appreciate any help in resolving this.

 

Thanks