Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Can you create a hive table in ORC Format from SparkSQL directly

avatar
Master Guru

I've done so with a sqlContext.sql ("create table...") then a sqlContext.sql("insert into")

but a dataframe.write.orc will produce an ORC file that cannot be seen as hive.

What are all the ways to work with ORC from Spark?

1 ACCEPTED SOLUTION

avatar
Super Guru

@Timothy Spann

Did you tried with this syntax?

var Rddtb= objHiveContext.sql("select * from sample")
val dfTable = Rddtb.toDF()
dfTable.write.format("orc").mode(SaveMode.Overwrite).saveAsTable("db1.test1")

View solution in original post

4 REPLIES 4

avatar
Super Guru

@Timothy Spann

Did you tried with this syntax?

var Rddtb= objHiveContext.sql("select * from sample")
val dfTable = Rddtb.toDF()
dfTable.write.format("orc").mode(SaveMode.Overwrite).saveAsTable("db1.test1")

avatar
Master Guru

@Jitendra Yadav that worked for me in zeppelin and the data looks good.

avatar

Answer to your Question "What are all the ways to work with ORC from Spark?"

I am using spark-sql and have created ORC table as well as other formats and found no issue.

avatar

The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data.

It just like a File to store group of rows called stripes, along with auxiliary information in a file footer. It just a storage format, nothing to do with ORC/Spark.

5024-orcfilelayout.png