Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How can I specify table output directory using SparkSQL?

avatar

I'm using SparkSQL (local mode) in Zeppelin for development work. As I am not running on a cluster, I do not have /user/hive/warehouse directories. If I'm using strictly SQL, is there a way I can specify the directory of my newly created tables? How about setting the default output directories?

Failing Example:

%sql
create table pings as
select
  split(time, " ")[0] as month,
  split(time, " ")[2] as year,
  split(split(time, " ")[3], ":")[0] as hour,
  split(split(time, " ")[3], ":")[1] as minute,
  split(split(split(time, " ")[3], ":")[2], "\\.")[0] as second,
  substr(split(split(split(time, " ")[3], ":")[2], "\\.")[1],0, 3) as ms,
  *
from pings_raw
MetaException(message:file:/user/hive/warehouse/pings is not a directory or unable to create one)
set zeppelin.spark.sql.stacktrace = true to see full stacktrace
1 ACCEPTED SOLUTION

avatar
Guru

Exactly as you would in a hive CTAS:

%spark 
sc.parallelize(1 to 10).toDF.registerTempTable("pings_raw")

%sql 
create table pings 
   location '/path/to/pings'
as select * from pings_raw

View solution in original post

2 REPLIES 2

avatar
Guru

Exactly as you would in a hive CTAS:

%spark 
sc.parallelize(1 to 10).toDF.registerTempTable("pings_raw")

%sql 
create table pings 
   location '/path/to/pings'
as select * from pings_raw

avatar

Thanks, manually specifying the full path worked. Is there a way to set a default directory?

I've tried "set hive.metastore.warehouse.dir=my_dir" but it had no effect.