Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

read orc table from spark

Solved Go to solution

read orc table from spark

New Contributor

I am trying to read hive orc table from spark sql but its showing me the error

Caused by:

java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: delta_0067044_0067143 does not start with base_

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:998) ... 104 more

Caused by: java.lang.IllegalArgumentException: delta_0067044_0067143 does not start with base_

at org.apache.hadoop.hive.ql.io.AcidUtils.parseBase(AcidUtils.java:144)

1 ACCEPTED SOLUTION

Accepted Solutions

Re: read orc table from spark

New Contributor

From the error message it seems you are trying to read acid orc table from spark sql. There are certain limitations while reading this type of table from spark sql.

You can find more details in these jiras:

https://issues.apache.org/jira/browse/SPARK-16996

https://issues.apache.org/jira/browse/HIVE-15189

You can force compaction by running "alter table compact" query before reading data from spark sql to workaround this issue.

4 REPLIES 4

Re: read orc table from spark

Expert Contributor

please show your code

Re: read orc table from spark

New Contributor

scala> val df= sqlContext.sql("SELECT * FROM orc_table limit 1") 17/03/07 13:41:03 INFO ParseDriver: Parsing command: SELECT * FROM Orc_table limit 1 17/03/07 13:41:03 INFO ParseDriver: Parse Completed java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.sql.execution.datasources.LogicalRelation$anonfun$1.apply(LogicalRelation.scala:39) at org.apache.spark.sql.execution.datasources.LogicalRelation$anonfun$1.apply(LogicalRelation.scala:38) at scala.Option.map(Option.scala:145) at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:38) at org.apache.spark.sql.execution.datasources.LogicalRelation.copy(LogicalRelation.scala:31) at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$convertToOrcRelation(HiveMetastoreCatalog.scala:588) at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:647) at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:643) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$transformUp$1.apply(TreeNode.scala:335) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$5.apply(TreeNode.scala:332) at org.apache.spark.sql.catalyst.trees.TreeNode$anonfun$4.apply(TreeNode.scala:281) at scala.collection.Iterator$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321)

Re: read orc table from spark

Contributor

Can you share moe about what version of Spark you are using?

HDP added support for ORC in Spark 1.4 Please see the following article:

https://hortonworks.com/blog/bringing-orc-support-into-apache-spark/

Here is a bit of code that shows how this works:

val sqlContext =new org.apache.spark.sql.hive.HiveContext(sc)

Re: read orc table from spark

New Contributor

From the error message it seems you are trying to read acid orc table from spark sql. There are certain limitations while reading this type of table from spark sql.

You can find more details in these jiras:

https://issues.apache.org/jira/browse/SPARK-16996

https://issues.apache.org/jira/browse/HIVE-15189

You can force compaction by running "alter table compact" query before reading data from spark sql to workaround this issue.

Don't have an account?
Coming from Hortonworks? Activate your account here