Member since
04-16-2019
2
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4501 | 04-17-2019 09:40 AM |
04-17-2019
09:40 AM
Finally I find the solution. The problem was regarding to an incorrect version of the HWC jar. We have to take into consideration that the HWC jar has to be compatible with the HDP version. In my case I was using maven with Eclipse to build the solution
... View more
04-16-2019
03:54 PM
I'm trying to work with Spark and Hive for HDP 3.0. As I see in some articles, now we have to use Hive Warehouse Connector. Everything works fine except for one proble. I can't write a dataframe directlyh from Spark to Hive using Hive Warehouse connector. I have the following code: val spark = SparkSession
.builder()
.appName("Spark Hive Test Job")
.config("spark.sql.hive.hiveserver2.jdbc.url", "jdbc:hive2://s01.ndtstu.local:2181,m01.ndtstu.local:2181,m02.ndtstu.local:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;user=hive;password=hive?")
.getOrCreate()
val hive = HiveWarehouseSession.session(spark).build()
hive.createDatabase("testADD", true);
hive.setDatabase("testADD")
hive.createTable("tabletest").ifNotExists()
.column("k", "int")
.column("value", "int")
.create()
import spark.implicits._
val x = Seq((4,4),(5,5)).toDF("k","value")
x.write
.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector")
.mode("append")
.option("database", "testADD")
.option("table", "tabletest2")
.save() And I have the following error stack trace: 19/04/16 13:09:37 ERROR WriteToDataSourceV2Exec: Data source writer com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceWriter@7057dbda is aborting.
19/04/16 13:09:37 ERROR HiveWarehouseDataSourceWriter: Aborted DataWriter job 20190416130935-a22961fa-7dee-4eed-a41c-7f67bbf3bdae
19/04/16 13:09:37 ERROR WriteToDataSourceV2Exec: Data source writer com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceWriter@7057dbda aborted.
Exception in thread "main" org.apache.spark.SparkException: Writing job aborted.
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:112)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:656)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:656)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:256)
at test.Test2$.main(Test2.scala:40)
at test.Test2.main(Test2.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, s05.ndtstu.local, executor 1): java.lang.AbstractMethodError: com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataWriterFactory.createDataWriter(II)Lorg/apache/spark/sql/sources/v2/writer/DataWriter;
at org.apache.spark.sql.execution.datasources.v2.InternalRowDataWriterFactory.createDataWriter(WriteToDataSourceV2.scala:191)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2.scala:129)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:79)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2.scala:82)
... 25 more
Caused by: java.lang.AbstractMethodError: com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataWriterFactory.createDataWriter(II)Lorg/apache/spark/sql/sources/v2/writer/DataWriter;
at org.apache.spark.sql.execution.datasources.v2.InternalRowDataWriterFactory.createDataWriter(WriteToDataSourceV2.scala:191)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2.scala:129)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:79)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$2.apply(WriteToDataSourceV2.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) I can't not find any information regarding to this error
... View more
Labels: