Created 12-18-2017 09:15 PM
I have the following Java Spark Hive Example as can be found on the official apache/spark Github. I have spend a lot of time understanding how to run the example in my Hortonworks Hadoop Sandbox without success.
Currently, I am doing the following:
Setting the SparkSession to master local, changing spark.sql.warehouse.dir to hive.metastore.uris and set thrift://localhost:9083 (as I can see in the Hive confing in Ambari) as warehouseLocation.
SparkSession spark =SparkSession.builder().appName("Java Spark Hive Example").master("local[*]").config("hive.metastore.uris","thrift://localhost:9083").enableHiveSupport().getOrCreate();
Then I replace spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src");
with a path to hdfs where I have uploaded kv1.txt:
spark.sql("LOAD DATA LOCAL INPATH 'hdfs:///tmp/kv1.txt' INTO TABLE src");
The last step is to make the JAR with mvn package
on the pom.xml - it builds without errors and gives me original-spark-examples_2.11-2.3.0-SNAPSHOT.jar
I copy the assembly over to the Hadoop Sandbox scp -P 2222 ./target/original-spark-examples_2.11-2.3.0-SNAPSHOT.jar root@sandbox.hortonworks.com:/root
and use spark-submit to run the code /usr/hdp/current/spark2-client/bin/spark-submit --class "JavaSparkHiveExample" --master local ./original-spark-examples_2.11-2.3.0-SNAPSHOT.jar
Which return the following error:
[root@sandbox-hdp ~]#/usr/hdp/current/spark2-client/bin/spark-submit --class"JavaSparkHiveExample"--master local ./original-spark-examples_2.11-2.3.0-SNAPSHOT.jar
java.lang.ClassNotFoundException:JavaSparkHiveExample
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(NativeMethod)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:739)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)[root@sandbox-hdp ~]#
..and here I am totally stuck, probably I am missing some steps to prepare the code to run and so on.
I would be very happy if I could get some help to get this code to run on my Hadoop Sandbox. I was able to run the JavaWordCount.java Spark example just fine but with this one I am totally stuck. Thanks 🙂
Complete JavaSparkHiveExample.java
Created 12-19-2017 01:28 AM
Hi @Eric H,
could you please check the complete class name with the package name
--class "org.apache.spark.examples.sql.hive.JavaSparkHiveExample"
as that particular class under the package it couldn't reference directly.
Hope this helps !!
Created 12-19-2017 01:28 AM
Hi @Eric H,
could you please check the complete class name with the package name
--class "org.apache.spark.examples.sql.hive.JavaSparkHiveExample"
as that particular class under the package it couldn't reference directly.
Hope this helps !!
Created 12-19-2017 08:11 AM
Hi @bkosaraju,
That solved the problem. Many thanks for your help!