Support Questions

Find answers, ask questions, and share your expertise

Can you create a hive table in ORC Format from SparkSQL directly

avatar
Master Guru

I've done so with a sqlContext.sql ("create table...") then a sqlContext.sql("insert into")

but a dataframe.write.orc will produce an ORC file that cannot be seen as hive.

What are all the ways to work with ORC from Spark?

1 ACCEPTED SOLUTION

avatar
Super Guru

@Timothy Spann

Did you tried with this syntax?

var Rddtb= objHiveContext.sql("select * from sample")
val dfTable = Rddtb.toDF()
dfTable.write.format("orc").mode(SaveMode.Overwrite).saveAsTable("db1.test1")

View solution in original post

4 REPLIES 4

avatar
Super Guru

@Timothy Spann

Did you tried with this syntax?

var Rddtb= objHiveContext.sql("select * from sample")
val dfTable = Rddtb.toDF()
dfTable.write.format("orc").mode(SaveMode.Overwrite).saveAsTable("db1.test1")

avatar
Master Guru

@Jitendra Yadav that worked for me in zeppelin and the data looks good.

avatar

Answer to your Question "What are all the ways to work with ORC from Spark?"

I am using spark-sql and have created ORC table as well as other formats and found no issue.

avatar

The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data.

It just like a File to store group of rows called stripes, along with auxiliary information in a file footer. It just a storage format, nothing to do with ORC/Spark.

5024-orcfilelayout.png