Support Questions

Find answers, ask questions, and share your expertise

Cannot write Dataframe from Spark to Hive using Hive warehouse connector

avatar
New Contributor

I'm trying to work with Spark and Hive for HDP 3.0. As I see in some articles, now we have to use Hive Warehouse Connector. Everything works fine except for one proble.

I can't write a dataframe directlyh from Spark to Hive using Hive Warehouse connector.


I have the following code:


 val spark = SparkSession
      .builder()
      .appName("Spark Hive Test Job")
      .config("spark.sql.hive.hiveserver2.jdbc.url", "jdbc:hive2://s01.ndtstu.local:2181,m01.ndtstu.local:2181,m02.ndtstu.local:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;user=hive;password=hive?")
      .getOrCreate()
      
      
    val hive = HiveWarehouseSession.session(spark).build()
    hive.createDatabase("testADD", true);
    hive.setDatabase("testADD")
    hive.createTable("tabletest").ifNotExists()
        .column("k", "int")
        .column("value", "int")
        .create()
    
    import spark.implicits._
    
    val x = Seq((4,4),(5,5)).toDF("k","value")
    
    x.write
      .format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
      .mode("append")
      .option("database", "testADD")
      .option("table", "tabletest2")
      .save()


And I have the following error stack trace:


19/04/16 13:09:37 ERROR WriteToDataSourceV2Exec: Data source writer com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceWriter@7057dbda is aborting.
19/04/16 13:09:37 ERROR HiveWarehouseDataSourceWriter: Aborted DataWriter job 20190416130935-a22961fa-7dee-4eed-a41c-7f67bbf3bdae
19/04/16 13:09:37 ERROR WriteToDataSourceV2Exec: Data source writer com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceWriter@7057dbda aborted.
Exception in thread "main" org.apache.spark.SparkException: Writing job aborted.
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:112)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:256)
        at test.Test2$.main(Test2.scala:40)
        at test.Test2.main(Test2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, s05.ndtstu.local, executor 1): java.lang.AbstractMethodError: com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataWriterFactory.createDataWriter(II)Lorg/apache/spark/sql/sources/v2/writer/DataWriter;
        at org.apache.spark.sql.execution.datasources.v2.InternalRowDataWriterFactory.createDataWriter(WriteToDataSourceV2.scala:191)
        at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2.scala:129)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:79)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:78)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:82)
        ... 25 more
Caused by: java.lang.AbstractMethodError: com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataWriterFactory.createDataWriter(II)Lorg/apache/spark/sql/sources/v2/writer/DataWriter;
        at org.apache.spark.sql.execution.datasources.v2.InternalRowDataWriterFactory.createDataWriter(WriteToDataSourceV2.scala:191)
        at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2.scala:129)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:79)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:78)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)



I can't not find any information regarding to this error

1 ACCEPTED SOLUTION

avatar
New Contributor

Finally I find the solution. The problem was regarding to an incorrect version of the HWC jar.

We have to take into consideration that the HWC jar has to be compatible with the HDP version. In my case I was using maven with Eclipse to build the solution

View solution in original post

1 REPLY 1

avatar
New Contributor

Finally I find the solution. The problem was regarding to an incorrect version of the HWC jar.

We have to take into consideration that the HWC jar has to be compatible with the HDP version. In my case I was using maven with Eclipse to build the solution