Created on 01-10-2018 09:03 AM - edited 09-16-2022 05:43 AM
When creating a Hive table with Druid storage handler
CREATE TABLE MyDruidTable ( `__time` timestamp , `dim1` string , `meas` int) STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
and executing this Spark code
import org.apache.spark.sql.SaveMode val my_df = spark.read.parquet("<source>").withColumn("__time", (col("crdt")/1000).cast("timestamp")).limit(2) my_df .write.mode(SaveMode.Overwrite).saveAsTable("MyDruidTable")
I get following error stack trace.
java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:347) at scala.None$.get(Option.scala:345) at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:110) at org.apache.spark.sql.execution.datasources.PreprocessTableCreation$$anonfun$apply$2.applyOrElse(rules.scala:75) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:75) at org.apache.spark.sql.execution.datasources.PreprocessTableCreation.apply(rules.scala:71) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66) at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) at org.apache.spark.sql.execution.QueryExecution.withCachedData$lzycompute(QueryExecution.scala:73) at org.apache.spark.sql.execution.QueryExecution.withCachedData(QueryExecution.scala:72) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:78) at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:78) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:609) at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:419) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:398) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354) ... 50 elided
Doesn't Spark support Druid? Or is there a way to troubleshoot?
We're using Spark with LLAP based on https://community.hortonworks.com/articles/101181/rowcolumn-level-security-in-sql-for-apache-spark-2...
Created 05-04-2018 02:00 PM
I tried the following, and it works.
df3.write.mode("append").insertInto("my_druid_table")
Created 05-29-2018 01:26 PM
Hello, please, could you @Johann Voppichler describe effort done to create Hive table with druid storage handler?
We tried many options to integrate Hive 1.1.0 version and Druid 0.12 version via hive-druid-handler-2.3.0.jar (or 3.0.0), hive-metastore-2.3.0.jar (or 3.0.0), hive-exec-2.3.0.jar (or 3.0.0).
Only output when creating table via beeline is there is missing class. All classes which are subjects of errors are included but there is probably conflict between .jars sumplemented.
Is it possible integrate Hive-druid within our versions? What combinations of versions are supported?
Thank you!
errors:
java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.session.SessionState$LogHelper.<init>(Lorg/slf4j/Logger;)V (state=,code=0)
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.druid.DruidStorageHandler
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/StorageHandlerInfo