Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Creating permanent EXTERNAL table from partitioned directory in spark-sql

New Contributor

Two expose data to tableau I have to create external tables from spark sql which points to partitioned parquet file directory which look like below.

data

/key=2017_12_15

/gender=male

/gender=female

*.parquet

/key=2017_12_16

/gender=male

/gender=female

*.parquet

Now i have to create three external table one which have all data, one which only have male data and third with female data. I tried below create commands but it's not working. I think issue is with * . Can someone please let me know how can i achieve it through spark-sql. Just for every one information * for with sqlContext object but I cannot use it as it don't have feature to create external table.

CREATE EXTERNAL TABLE IF NOT EXISTS all (name STRING, address STRING, date DATE) STORED AS PARQUET LOCATION 'data/key=*';

CREATE EXTERNAL TABLE IF NOT EXISTS male (name STRING, address STRING, date DATE) STORED AS PARQUET LOCATION 'data/key=*/gender=male';

CREATE EXTERNAL TABLE IF NOT EXISTS female (name STRING, address STRING, date DATE) STORED AS PARQUET LOCATION 'data/key=*/gender=female';

1 REPLY 1

Super Collaborator

Hi @Rishi Saraf,

there are two ways you can achieve this,

By reversing the partition order which means date followed by gender, implies gender get stored before the key so that you can create two tables for each gender based out of the date partition. and one table with parent directory as location will cover two of them.

Other approach is to to create one table and three different views on top of that table, As the query in view automatically eliminate the data of different category(CBO eliminates by partition pruning ), while retrieving different category partition.

Hope this helps !!