I have a spark thrift server up and running ( with yarn and hive support ) .
1) Hive is up and running, table X is available in the default database.
2) i can see the tables thru beeline as well
3) Hive meta store is backed by an Oracle database
I am using JavaHiveContext created using the below code in my java client :
JavaSparkContext sc = new JavaSparkContext("yarn-client", "default"); -- or yarn-cluster JavaHiveContext sqlContext = new JavaHiveContext(sc); JavaSchemaRDD resultsRDD = sqlContext.sql("select col1 from X ");
Question1: What jdbc jars should be in the classpath of the sparkthrift server and the java client ? what oracle specific jars should be in the classpath? by default my program looks for org.apache.derby.jdbc.EmbeddedDriver..
I add the derby**.jar to see what other issues setup i may encounter and i ended up with these :
a) When using the yarn-client as master, the program hangs trying to connect to 0.0.0.0:8032, how can this be resolved to the right address ? Any additional configs to be specifcied ?
b) When using the yarn-cluster as master, the program terminates with message:default.X table not found .
this is probably due to the fact its looking for this table in derby?
Question2: Is the above snippet ok in terms of creating the contexts
it appears that the above are primarily related to classpath issues, maybe am wrong..