Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How do I create an ORC Hive table from Spark?

Solved Go to solution
Highlighted

How do I create an ORC Hive table from Spark?

Rising Star

I'm currently using Spark 1.4 and I'm loading some data into a DataFrame using jdbc:

val jdbcDF = sqlContext.load("jdbc", options)

How can I save the jdbcDF DataFrame to a Hive table using the ORC file format?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How do I create an ORC Hive table from Spark?

12 REPLIES 12

Re: How do I create an ORC Hive table from Spark?

Re: How do I create an ORC Hive table from Spark?

Rising Star

Thanks for the helpful links! Should I create the Hive table ahead of time or could I do everything within spark?

Re: How do I create an ORC Hive table from Spark?

Re: How do I create an ORC Hive table from Spark?

@Kit Menke

If you want to access your table from hive, you have two options:

1- create table ahead and use df.write.fromat("orc")

2- use Brandon's suggestion here, register df as temp_table and do create table as select from temp_table.

See code examples here:

https://community.hortonworks.com/questions/6023/orgapachesparksparkexception-task-failed-while-wri....

If you use saveAsTable function, it will create a table in hive metastore, but hive wont be able to query it. Only spark can use the table with this method.

Re: How do I create an ORC Hive table from Spark?

You can just write out the DF as ORC and the underlying directory will be created. LMK, if this doesn't work.

Re: How do I create an ORC Hive table from Spark?

Rising Star

Yep, the ORC directory is created but a Hive table is not.

Re: How do I create an ORC Hive table from Spark?

Rising Star
@vshukla

I am also facing the same issue .. I saved the data in orc format from DF and created external hive table ..when I do show tables in hive context in spark it shows me the table but I couldnt see any table in my hive warehouse so when I query the hive external table. when I just create the hive table(no df no data processing ) using hivecontext table get created and able to query also .Unable to understand this strange behaviour . Am I misisng something ?

for ex : hiveContext.sql("CREATE TABLE IF NOT EXISTS TestTable (name STRING, age STRING)")

shows me the table in hive also.

Re: How do I create an ORC Hive table from Spark?

The way I have done this is to first register a temp table in Spark and then leverage the sql method of the HiveContext to create a new table in hive using the data from the temp table. For example if I have a dataframe df and HiveContext hc the general process is:

df.registerTempTable("my_temp_table")
hc.sql("CREATE TABLE new_table_name STORED AS ORC  AS SELECT * from my_temp_table")

Re: How do I create an ORC Hive table from Spark?

Rising Star

Very interesting! I will try this out!