About saxofon4

saxofon4 · ‎01-21-2018

I have the following Java code that read a JSON file from HDFS and output it as a HIVE view using Spark. package org.apache.spark.examples.sql.hive; import java.io.File; import java.io.Serializable; import java.util.ArrayList; import java.util.List; import org.apache.spark.api.java.function.MapFunction; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Encoders; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; // $example off:spark_hive$ public class JavaSparkHiveExample { public static void main(String[] args) { // $example on:spark_hive$ SparkSession spark = SparkSession .builder() .appName("Java Spark Hive Example") .master("local[*]") .config("hive.metastore.uris", "thrift://localhost:9083") .enableHiveSupport() .getOrCreate(); Dataset<Row> jsonTest = spark.read().json("/tmp/testJSON.json"); jsonTest.createOrReplaceTempView("jsonTest"); Dataset<Row> showAll = spark.sql("SELECT * FROM jsonTest"); showAll.show(); spark.stop(); } } I would like to change so the JSON file is read from the system instead of HDFS (for instance from the same location where the program is executed). Furthermore, how could I remake it to INSERT the JSON into table test1 instead of just making a view out of it? Help is very appreciated!

saxofon4 · ‎12-19-2017

Hi @bkosaraju, That solved the problem. Many thanks for your help!

saxofon4 · ‎12-18-2017

I have the following Java Spark Hive Example as can be found on the official apache/spark Github. I have spend a lot of time understanding how to run the example in my Hortonworks Hadoop Sandbox without success. Currently, I am doing the following: Importing the apache/spark examples as I Maven-project, this is working fine and I am not getting any issues with decencies so no problem here I'll guess. The next step is to prepare the code to run in my Hadoop Sandbox - the issue is starting here, I am probably setting something wrong to being with. This is what I am doing: Setting the SparkSession to master local, changing spark.sql.warehouse.dir to hive.metastore.uris and set thrift://localhost:9083 (as I can see in the Hive confing in Ambari) as warehouseLocation. SparkSession spark =SparkSession.builder().appName("Java Spark Hive Example").master("local[*]").config("hive.metastore.uris","thrift://localhost:9083").enableHiveSupport().getOrCreate(); Then I replace spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src"); with a path to hdfs where I have uploaded kv1.txt: spark.sql("LOAD DATA LOCAL INPATH 'hdfs:///tmp/kv1.txt' INTO TABLE src"); The last step is to make the JAR with mvn package on the pom.xml - it builds without errors and gives me original-spark-examples_2.11-2.3.0-SNAPSHOT.jar I copy the assembly over to the Hadoop Sandbox scp -P 2222 ./target/original-spark-examples_2.11-2.3.0-SNAPSHOT.jar root@sandbox.hortonworks.com:/root and use spark-submit to run the code /usr/hdp/current/spark2-client/bin/spark-submit --class "JavaSparkHiveExample" --master local ./original-spark-examples_2.11-2.3.0-SNAPSHOT.jar Which return the following error: [root@sandbox-hdp ~]#/usr/hdp/current/spark2-client/bin/spark-submit --class"JavaSparkHiveExample"--master local ./original-spark-examples_2.11-2.3.0-SNAPSHOT.jar java.lang.ClassNotFoundException:JavaSparkHiveExample at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(NativeMethod) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:230) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:739) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)[root@sandbox-hdp ~]# ..and here I am totally stuck, probably I am missing some steps to prepare the code to run and so on. I would be very happy if I could get some help to get this code to run on my Hadoop Sandbox. I was able to run the JavaWordCount.java Spark example just fine but with this one I am totally stuck. Thanks 🙂 Complete JavaSparkHiveExample.java

Online	Offline
Last Visited	‎01-21-2018 10:47 PM

Member Since	‎12-18-2017 09:09 PM
Last Visited	‎01-21-2018 10:47 PM
Posts	4

Cloudera Community

Java Spark insert JSON into Hive from the local fi...

Re: Trouble running Java Spark Hive Example

Trouble running Java Spark Hive Example