Support Questions
Find answers, ask questions, and share your expertise

how to write dataframe data in hive table by spark scala.

Highlighted

how to write dataframe data in hive table by spark scala.

New Contributor

I am try to push data in existing hive table, i have already created orc table in hive not able to push data in hive. this code is work if i copy paste on spark console but not able to run by spark-submit.

import org.apache.spark.SparkConfimport org.apache.spark.SparkContextobjectTestCode{def main(args:Array[String]):Unit={val conf =newSparkConf().setAppName("first example").setMaster("local")val sc =newSparkContext(conf)val sqlContext =new org.apache.spark.sql.SQLContext(sc)for(i <-0 to 100-1){//  sample value but it replace with business logic. and try to push into table.for loop consider as business logic.var fstring ="fstring"+ i
      var cmd ="cmd"+ i
      var idpath ="idpath"+ i
      import sqlContext.implicits._
      val sDF =Seq((fstring, cmd, idpath)).toDF("t_als_s_path","t_als_s_cmd","t_als_s_pd")
      sDF.write.insertInto("l_sequence");//sDF.write.format("orc").saveAsTable("l_sequence");
      println("write data ==> "+ i)}}

Giving the error.

Exception in thread "main" org.apache.spark.sql.AnalysisException:Table or view not found: l_sequence;
        at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$lookupTableFromCatalog(Analyzer.scala:449)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$8.applyOrElse(Analyzer.scala:455)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$anonfun$apply$8.applyOrElse(Analyzer.scala:453)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$anonfun$resolveOperators$1.apply(LogicalPlan.scala:61)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:60)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:453)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:443)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1$anonfun$apply$1.apply(RuleExecutor.scala:85)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1$anonfun$apply$1.apply(RuleExecutor.scala:82)
        at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
        at scala.collection.immutable.List.foldLeft(List.scala:84)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1.apply(RuleExecutor.scala:82)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor$anonfun$execute$1.apply(RuleExecutor.scala:74)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:65)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:63)
        at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:51)
        at org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:69)
        at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:68)
        at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:78)
        at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:76)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:83)
        at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:83)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:259)
        at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:239)
        at com.hq.bds.Helloword$anonfun$main$1.apply$mcVI$sp(Helloword.scala:16)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
        at com.hq.bds.Helloword$.main(Helloword.scala:10)
        at com.hq.bds.Helloword.main(Helloword.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethod)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:729)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2 REPLIES 2
Highlighted

Re: how to write dataframe data in hive table by spark scala.

Expert Contributor

1. Import SparkSession, SparkContext is deprecated

import org.apache.spark.sql.SparkSession

2. Check you have hive-site.xml in the "/usr/lib/spark/conf" directory

You can try adding this as well with the spark-submit

 --files /usr/hdp/current/spark-client/conf/hive-site.xml 
Highlighted

Re: how to write dataframe data in hive table by spark scala.

Expert Contributor

Try this (hcc has buggy formatting issues, ignore that, notice the import and spark val)

      import org.apache.spark.sql.SparkSession
     



       objectTestCode{ 



      def main(args:Array[String]):Unit={

      val spark = SparkSession
      .builder()
      .appName("SilverTailParser")
      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .enableHiveSupport()
      .getOrCreate()


      /*
      sample value but it replace with business logic. and try to push into table.
      for loop consider as business logic.
       */
     for(i <-0 to 100-1){
      var fstring ="fstring"+ i
      var cmd ="cmd"+ i
      var idpath ="idpath"+ i

      import spark.implicits._ // NOTE
      
      val sDF =Seq((fstring, cmd, idpath)).toDF("t_als_s_path","t_als_s_cmd","t_als_s_pd")
      sDF.write.insertInto("l_sequence");
      println("write data ==> "+ i)}}