Member since
12-18-2017
4
Posts
0
Kudos Received
0
Solutions
01-21-2018
10:47 PM
I have the following Java code that read a JSON file from HDFS and output it as a HIVE view using Spark.
package org.apache.spark.examples.sql.hive;
import java.io.File;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
// $example off:spark_hive$
public class JavaSparkHiveExample {
public static void main(String[] args) {
// $example on:spark_hive$
SparkSession spark = SparkSession
.builder()
.appName("Java Spark Hive Example")
.master("local[*]")
.config("hive.metastore.uris", "thrift://localhost:9083")
.enableHiveSupport()
.getOrCreate();
Dataset<Row> jsonTest = spark.read().json("/tmp/testJSON.json");
jsonTest.createOrReplaceTempView("jsonTest");
Dataset<Row> showAll = spark.sql("SELECT * FROM jsonTest");
showAll.show();
spark.stop();
}
}
I would like to change so the JSON file is read from the system instead of HDFS (for instance from the same location where the program is executed). Furthermore, how could I remake it to INSERT the JSON into table test1 instead of just making a view out of it? Help is very appreciated!
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
12-19-2017
08:11 AM
Hi @bkosaraju, That solved the problem. Many thanks for your help!
... View more
12-18-2017
09:15 PM
I have the following Java Spark Hive Example as can be found on the official apache/spark Github. I have spend a lot of time understanding how to run the example in my Hortonworks Hadoop Sandbox without success. Currently, I am doing the following:
Importing the apache/spark examples as I Maven-project, this is working fine and I am not getting any issues with decencies so no problem here I'll guess. The next step is to prepare the code to run in my Hadoop Sandbox - the issue is starting here, I am probably setting something wrong to being with. This is what I am doing: Setting the SparkSession to master local, changing spark.sql.warehouse.dir to hive.metastore.uris and set thrift://localhost:9083 (as I can see in the Hive confing in Ambari) as warehouseLocation. SparkSession spark =SparkSession.builder().appName("Java Spark Hive Example").master("local[*]").config("hive.metastore.uris","thrift://localhost:9083").enableHiveSupport().getOrCreate(); Then I replace spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src");
with a path to hdfs where I have uploaded kv1.txt: spark.sql("LOAD DATA LOCAL INPATH 'hdfs:///tmp/kv1.txt' INTO TABLE src"); The last step is to make the JAR with mvn package on the pom.xml - it builds without errors and gives me original-spark-examples_2.11-2.3.0-SNAPSHOT.jar I copy the assembly over to the Hadoop Sandbox scp -P 2222 ./target/original-spark-examples_2.11-2.3.0-SNAPSHOT.jar root@sandbox.hortonworks.com:/root and use spark-submit to run the code /usr/hdp/current/spark2-client/bin/spark-submit --class "JavaSparkHiveExample" --master local ./original-spark-examples_2.11-2.3.0-SNAPSHOT.jar Which return the following error: [root@sandbox-hdp ~]#/usr/hdp/current/spark2-client/bin/spark-submit --class"JavaSparkHiveExample"--master local ./original-spark-examples_2.11-2.3.0-SNAPSHOT.jar
java.lang.ClassNotFoundException:JavaSparkHiveExample
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(NativeMethod)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:739)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)[root@sandbox-hdp ~]# ..and here I am totally stuck, probably I am missing some steps to prepare the code to run and so on. I would be very happy if I could get some help to get this code to run on my Hadoop Sandbox. I was able to run the JavaWordCount.java Spark example just fine but with this one I am totally stuck. Thanks 🙂 Complete JavaSparkHiveExample.java
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark