Support Questions
Find answers, ask questions, and share your expertise

Unable to saveAsTable into Druid with Spark

New Contributor

When creating a Hive table with Druid storage handler

CREATE TABLE MyDruidTable (
  `__time` timestamp
  , `dim1` string
  , `meas` int)
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'

and executing this Spark code

import org.apache.spark.sql.SaveMode
    val my_df = spark.read.parquet("<source>").withColumn("__time", (col("crdt")/1000).cast("timestamp")).limit(2)

my_df .write.mode(SaveMode.Overwrite).saveAsTable("MyDruidTable")

I get following error stack trace.

java.util.NoSuchElementException: None.get
  at scala.None$.get(Option.scala:347)
  at scala.None$.get(Option.scala:345)
  at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:110)
  at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:75)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256)
  at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:75)
  at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:71)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
  at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
  at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
  at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
  at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
  at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
  at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
  at org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:73)
  at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72)
  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78)
  at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78)
  at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
  at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
  at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
  at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609)
  at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:419)
  at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398)
  at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354)
  ... 50 elided

Doesn't Spark support Druid? Or is there a way to troubleshoot?

We're using Spark with LLAP based on https://community.hortonworks.com/articles/101181/rowcolumn-level-security-in-sql-for-apache-spark-2...

2 REPLIES 2

Re: Unable to saveAsTable into Druid with Spark

Explorer

I tried the following, and it works.

df3.write.mode("append").insertInto("my_druid_table")

Re: Unable to saveAsTable into Druid with Spark

New Contributor

Hello, please, could you @Johann Voppichler describe effort done to create Hive table with druid storage handler?

We tried many options to integrate Hive 1.1.0 version and Druid 0.12 version via hive-druid-handler-2.3.0.jar (or 3.0.0), hive-metastore-2.3.0.jar (or 3.0.0), hive-exec-2.3.0.jar (or 3.0.0).

Only output when creating table via beeline is there is missing class. All classes which are subjects of errors are included but there is probably conflict between .jars sumplemented.

Is it possible integrate Hive-druid within our versions? What combinations of versions are supported?

Thank you!

errors:

java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.session.SessionState$LogHelper.<init>(Lorg/slf4j/Logger;)V (state=,code=0)

java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.druid.DruidStorageHandler

java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/StorageHandlerInfo