Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

How do I create an ORC Hive table from Spark?

SOLVED Go to solution

How do I create an ORC Hive table from Spark?

Rising Star

I'm currently using Spark 1.4 and I'm loading some data into a DataFrame using jdbc:

val jdbcDF = sqlContext.load("jdbc", options)

How can I save the jdbcDF DataFrame to a Hive table using the ORC file format?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How do I create an ORC Hive table from Spark?

12 REPLIES 12

Re: How do I create an ORC Hive table from Spark?

Highlighted

Re: How do I create an ORC Hive table from Spark?

Rising Star

Thanks for the helpful links! Should I create the Hive table ahead of time or could I do everything within spark?

Re: How do I create an ORC Hive table from Spark?

Re: How do I create an ORC Hive table from Spark?

@Kit Menke

If you want to access your table from hive, you have two options:

1- create table ahead and use df.write.fromat("orc")

2- use Brandon's suggestion here, register df as temp_table and do create table as select from temp_table.

See code examples here:

https://community.hortonworks.com/questions/6023/orgapachesparksparkexception-task-failed-while-wri....

If you use saveAsTable function, it will create a table in hive metastore, but hive wont be able to query it. Only spark can use the table with this method.

Re: How do I create an ORC Hive table from Spark?

You can just write out the DF as ORC and the underlying directory will be created. LMK, if this doesn't work.

Re: How do I create an ORC Hive table from Spark?

Rising Star

Yep, the ORC directory is created but a Hive table is not.

Re: How do I create an ORC Hive table from Spark?

Rising Star
@vshukla

I am also facing the same issue .. I saved the data in orc format from DF and created external hive table ..when I do show tables in hive context in spark it shows me the table but I couldnt see any table in my hive warehouse so when I query the hive external table. when I just create the hive table(no df no data processing ) using hivecontext table get created and able to query also .Unable to understand this strange behaviour . Am I misisng something ?

for ex : hiveContext.sql("CREATE TABLE IF NOT EXISTS TestTable (name STRING, age STRING)")

shows me the table in hive also.

Re: How do I create an ORC Hive table from Spark?

The way I have done this is to first register a temp table in Spark and then leverage the sql method of the HiveContext to create a new table in hive using the data from the temp table. For example if I have a dataframe df and HiveContext hc the general process is:

df.registerTempTable("my_temp_table")
hc.sql("CREATE TABLE new_table_name STORED AS ORC  AS SELECT * from my_temp_table")

Re: How do I create an ORC Hive table from Spark?

Rising Star

Very interesting! I will try this out!