Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Creating permanent EXTERNAL table from partitioned directory in spark-sql

Creating permanent EXTERNAL table from partitioned directory in spark-sql

New Contributor

Two expose data to tableau I have to create external tables from spark sql which points to partitioned parquet file directory which look like below.

data

/key=2017_12_15

/gender=male

/gender=female

*.parquet

/key=2017_12_16

/gender=male

/gender=female

*.parquet

Now i have to create three external table one which have all data, one which only have male data and third with female data. I tried below create commands but it's not working. I think issue is with * . Can someone please let me know how can i achieve it through spark-sql. Just for every one information * for with sqlContext object but I cannot use it as it don't have feature to create external table.

CREATE EXTERNAL TABLE IF NOT EXISTS all (name STRING, address STRING, date DATE) STORED AS PARQUET LOCATION 'data/key=*';

CREATE EXTERNAL TABLE IF NOT EXISTS male (name STRING, address STRING, date DATE) STORED AS PARQUET LOCATION 'data/key=*/gender=male';

CREATE EXTERNAL TABLE IF NOT EXISTS female (name STRING, address STRING, date DATE) STORED AS PARQUET LOCATION 'data/key=*/gender=female';

1 REPLY 1

Re: Creating permanent EXTERNAL table from partitioned directory in spark-sql

Super Collaborator

Hi @Rishi Saraf,

there are two ways you can achieve this,

By reversing the partition order which means date followed by gender, implies gender get stored before the key so that you can create two tables for each gender based out of the date partition. and one table with parent directory as location will cover two of them.

Other approach is to to create one table and three different views on top of that table, As the query in view automatically eliminate the data of different category(CBO eliminates by partition pruning ), while retrieving different category partition.

Hope this helps !!