- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
read orc table from spark
- Labels:
-
Apache Hive
-
Apache Spark
Created ‎02-28-2017 06:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to read hive orc table from spark sql but its showing me the error
Caused by:
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: delta_0067044_0067143 does not start with base_
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 104 more
Caused by: java.lang.IllegalArgumentException: delta_0067044_0067143 does not start with base_
at org.apache.hadoop.hive.ql.io.AcidUtils.parseBase(AcidUtils.java:144)
Created ‎03-06-2017 10:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From the error message it seems you are trying to read acid orc table from spark sql. There are certain limitations while reading this type of table from spark sql.
You can find more details in these jiras:
https://issues.apache.org/jira/browse/SPARK-16996
https://issues.apache.org/jira/browse/HIVE-15189
You can force compaction by running "alter table compact" query before reading data from spark sql to workaround this issue.
Created ‎03-02-2017 04:22 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
please show your code
Created ‎03-07-2017 08:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
scala> val df= sqlContext.sql("SELECT * FROM orc_table limit 1") 17/03/07 13:41:03 INFO ParseDriver: Parsing command: SELECT * FROM Orc_table limit 1 17/03/07 13:41:03 INFO ParseDriver: Parse Completed java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.sql.execution.datasources.LogicalRelation$anonfun$1.apply(LogicalRelation.scala:39) at org.apache.spark.sql.execution.datasources.LogicalRelation$anonfun$1.apply(LogicalRelation.scala:38) at scala.Option.map(Option.scala:145) at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:38) at org.apache.spark.sql.execution.datasources.LogicalRelation.copy(LogicalRelation.scala:31) at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$convertToOrcRelation(HiveMetastoreCatalog.scala:588) at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:647) at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:643) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:281) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321)
Created ‎03-06-2017 09:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you share moe about what version of Spark you are using?
HDP added support for ORC in Spark 1.4 Please see the following article:
https://hortonworks.com/blog/bringing-orc-support-into-apache-spark/
Here is a bit of code that shows how this works:
val sqlContext =new org.apache.spark.sql.hive.HiveContext(sc)
Created ‎03-06-2017 10:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From the error message it seems you are trying to read acid orc table from spark sql. There are certain limitations while reading this type of table from spark sql.
You can find more details in these jiras:
https://issues.apache.org/jira/browse/SPARK-16996
https://issues.apache.org/jira/browse/HIVE-15189
You can force compaction by running "alter table compact" query before reading data from spark sql to workaround this issue.
