Member since 
    
	
		
		
		12-18-2017
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                4
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		01-21-2018
	
		
		10:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	I have the following Java code that read a JSON file from HDFS and output it as a HIVE view using Spark. 
 package org.apache.spark.examples.sql.hive;
import java.io.File;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
import org.apache.spark.api.java.function.MapFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
// $example off:spark_hive$
public class JavaSparkHiveExample {
  public static void main(String[] args) {
    // $example on:spark_hive$
    SparkSession spark = SparkSession
      .builder()
      .appName("Java Spark Hive Example")
            .master("local[*]")
            .config("hive.metastore.uris", "thrift://localhost:9083")
      .enableHiveSupport()
      .getOrCreate();
    Dataset<Row> jsonTest = spark.read().json("/tmp/testJSON.json");
    jsonTest.createOrReplaceTempView("jsonTest");
    Dataset<Row> showAll = spark.sql("SELECT * FROM jsonTest");
    showAll.show();
    spark.stop();
  }
}
  I would like to change so the JSON file is read from the system instead of HDFS (for instance from the same location where the program is executed). Furthermore, how could I remake it to INSERT the JSON into table test1 instead of just making a view out of it?  Help is very appreciated! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Hive
 - 
						
							
		
			Apache Spark
 
			
    
	
		
		
		12-19-2017
	
		
		08:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @bkosaraju,  That solved the problem. Many thanks for your help! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-18-2017
	
		
		09:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I have the following Java Spark Hive Example as can be found on the official apache/spark Github. I have spend a lot of time understanding how to run the example in my Hortonworks Hadoop Sandbox without success.  Currently, I am doing the following:  
 Importing the apache/spark examples as I Maven-project, this is working fine and I am not getting any issues with decencies so no problem here I'll guess.  The next step is to prepare the code to run in my Hadoop Sandbox - the issue is starting here, I am probably setting something wrong to being with. This is what I am doing:   Setting the SparkSession to master local, changing spark.sql.warehouse.dir to hive.metastore.uris and set thrift://localhost:9083 (as I can see in the Hive confing in Ambari) as warehouseLocation.   SparkSession spark =SparkSession.builder().appName("Java Spark Hive Example").master("local[*]").config("hive.metastore.uris","thrift://localhost:9083").enableHiveSupport().getOrCreate();   Then I replace  spark.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src");
   with a path to hdfs where I have uploaded kv1.txt:   spark.sql("LOAD DATA LOCAL INPATH 'hdfs:///tmp/kv1.txt' INTO TABLE src");   The last step is to make the JAR with  mvn package  on the pom.xml - it builds without errors and gives me original-spark-examples_2.11-2.3.0-SNAPSHOT.jar  I copy the assembly over to the Hadoop Sandbox  scp -P 2222 ./target/original-spark-examples_2.11-2.3.0-SNAPSHOT.jar root@sandbox.hortonworks.com:/root   and use spark-submit to run the code  /usr/hdp/current/spark2-client/bin/spark-submit --class "JavaSparkHiveExample" --master local ./original-spark-examples_2.11-2.3.0-SNAPSHOT.jar   Which return the following error:   [root@sandbox-hdp ~]#/usr/hdp/current/spark2-client/bin/spark-submit --class"JavaSparkHiveExample"--master local ./original-spark-examples_2.11-2.3.0-SNAPSHOT.jar
java.lang.ClassNotFoundException:JavaSparkHiveExample
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(NativeMethod)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:739)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)[root@sandbox-hdp ~]#   ..and here I am totally stuck, probably I am missing some steps to prepare the code to run and so on.  I would be very happy if I could get some help to get this code to run on my Hadoop Sandbox. I was able to run the JavaWordCount.java Spark example just fine but with this one I am totally stuck. Thanks 🙂  Complete JavaSparkHiveExample.java 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Hadoop
 - 
						
							
		
			Apache Hive
 - 
						
							
		
			Apache Spark