Support Questions

adinaret · ‎04-16-2019

I'm trying to work with Spark and Hive for HDP 3.0. As I see in some articles, now we have to use Hive Warehouse Connector. Everything works fine except for one proble.

I can't write a dataframe directlyh from Spark to Hive using Hive Warehouse connector.

I have the following code:

 val spark = SparkSession
      .builder()
      .appName("Spark Hive Test Job")
      .config("spark.sql.hive.hiveserver2.jdbc.url", "jdbc:hive2://s01.ndtstu.local:2181,m01.ndtstu.local:2181,m02.ndtstu.local:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;user=hive;password=hive?")
      .getOrCreate()
      
      
    val hive = HiveWarehouseSession.session(spark).build()
    hive.createDatabase("testADD", true);
    hive.setDatabase("testADD")
    hive.createTable("tabletest").ifNotExists()
        .column("k", "int")
        .column("value", "int")
        .create()
    
    import spark.implicits._
    
    val x = Seq((4,4),(5,5)).toDF("k","value")
    
    x.write
      .format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
      .mode("append")
      .option("database", "testADD")
      .option("table", "tabletest2")
      .save()

And I have the following error stack trace:

19/04/16 13:09:37 ERROR WriteToDataSourceV2Exec: Data source writer com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceWriter@7057dbda is aborting.
19/04/16 13:09:37 ERROR HiveWarehouseDataSourceWriter: Aborted DataWriter job 20190416130935-a22961fa-7dee-4eed-a41c-7f67bbf3bdae
19/04/16 13:09:37 ERROR WriteToDataSourceV2Exec: Data source writer com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceWriter@7057dbda aborted.
Exception in thread "main" org.apache.spark.SparkException: Writing job aborted.
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:112)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
        at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:256)
        at test.Test2$.main(Test2.scala:40)
        at test.Test2.main(Test2.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, s05.ndtstu.local, executor 1): java.lang.AbstractMethodError: com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataWriterFactory.createDataWriter(II)Lorg/apache/spark/sql/sources/v2/writer/DataWriter;
        at org.apache.spark.sql.execution.datasources.v2.InternalRowDataWriterFactory.createDataWriter(WriteToDataSourceV2.scala:191)
        at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2.scala:129)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:79)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:78)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:82)
        ... 25 more
Caused by: java.lang.AbstractMethodError: com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataWriterFactory.createDataWriter(II)Lorg/apache/spark/sql/sources/v2/writer/DataWriter;
        at org.apache.spark.sql.execution.datasources.v2.InternalRowDataWriterFactory.createDataWriter(WriteToDataSourceV2.scala:191)
        at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2.scala:129)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:79)
        at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:78)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I can't not find any information regarding to this error

adinaret · ‎04-17-2019

Finally I find the solution. The problem was regarding to an incorrect version of the HWC jar.

We have to take into consideration that the HWC jar has to be compatible with the HDP version. In my case I was using maven with Eclipse to build the solution

View solution in original post

adinaret · ‎04-17-2019

Finally I find the solution. The problem was regarding to an incorrect version of the HWC jar.

We have to take into consideration that the HWC jar has to be compatible with the HDP version. In my case I was using maven with Eclipse to build the solution

Cloudera Community

Support Questions

Cannot write Dataframe from Spark to Hive using Hive warehouse connector