Created 12-17-2016 08:20 PM
Create a Spark applicaiton with SparkSQL inside
package SparkSamplePackage import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import org.apache.spark.sql._ import org.apache.spark.sql.hive._ object SparkSampleClass { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Spark Sample App") conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer") conf.set("spark.sepeculation","true") val sc = new SparkContext(conf) val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) import sqlContext.implicits._ val sampleDF = sqlContext.sql("select code, salary from hr.managers limit 10") sampleDF.collect.foreach(println) sc.stop() } }
spark-submit command looks like. Cluster is with latest HDP 2.5.3, spark is 1.6.2
spark-submit \ --class SparkSamplePackage.SparkSampleClass \ --master yarn-cluster \ --num-executors 2 \ --driver-memory 1g \ --executor-memory 1g \ --executor-cores 1 \ --files /usr/hdp/current/spark-client/conf/hive-site.xml \ target/SparkSample-1.0-SNAPSHOT.jar
Getting following error complaining that not able to instantiate metadata
client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: User class threw exception: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Please advise on how to address the issue.
Created 12-17-2016 08:30 PM
Hi @Sherry Noah
Can you try:
spark-submit \ --classSparkSamplePackage.SparkSampleClass \ --master yarn-cluster \ --num-executors 2 \ --driver-memory 1g \ --executor-memory 1g \ --executor-cores 1 \ --files /usr/hdp/current/spark-client/conf/hive-site.xml \ --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,target/SparkSample-1.0-SNAPSHOT.jar
Created 12-17-2016 08:30 PM
Hi @Sherry Noah
Can you try:
spark-submit \ --classSparkSamplePackage.SparkSampleClass \ --master yarn-cluster \ --num-executors 2 \ --driver-memory 1g \ --executor-memory 1g \ --executor-cores 1 \ --files /usr/hdp/current/spark-client/conf/hive-site.xml \ --jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar,target/SparkSample-1.0-SNAPSHOT.jar
Created 12-17-2016 08:35 PM
@Ryan Cicak Yes, it works. Thx