Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive - Add statically new layer of partitioning with dynamic partitioning by other columns

Hive - Add statically new layer of partitioning with dynamic partitioning by other columns

New Contributor

On hadoop, we have files structure like

 

hdfs://nameserviceid/foo/bar/tableLocation/partitionedByFolder1=value/partitionedByFolder2=value/file1.parquet

hdfs://nameserviceid/foo/bar/tableLocation/partitionedByFolder1=value/partitionedByFolder2=value/file2.parquet

 

Hive table definition is

  

CREATE EXTERNAL TABLE IF NOT EXISTS foo (...)

PARTITIONED BY (
partitionedBy1 DATE,
partitionedBy2 STRING
)
STORED AS PARQUETFILE LOCATION "hdfs://nameserviceid/foo/bar/tableLocation/"

Partitions (partitionedBy1, partitionedBy2) are added dynamically with table repair.

 

And is like

 

 '[partition:partitionedBy1=value1/partitionedBy2=value2]'
'[partition:partitionedBy1=value2/partitionedBy2=value3]'

Now to the existing table, we need to add external location outside of the root table dir (with exactly the same subdirs structure) as a new layer of partitioning.

 

hdfs://nameserviceid/foo/another_bar/tableLocation/partitionedByFolder1=value/partitionedByFolder2=value/file1.parquet

hdfs://nameserviceid/foo/another_bar/tableLocation/partitionedByFolder1=value/partitionedByFolder2=value/file2.parquet

 

So partitions should be like

 

'[partition:bar=bar/partitionedBy1=value1/partitionedBy2=value2]'
'[partition:bar=another_bar/partitionedBy1=value1/partitionedBy2=value2]'

'[partition:bar=bar/partitionedBy1=value2/partitionedBy2=value3]'
'[partition:bar=another_bar/partitionedBy1=value2/partitionedBy2=value3]'

 

I'm trying to achieve

 

 CREATE EXTERNAL TABLE IF NOT EXISTS foo (...)

PARTITIONED BY (
bar STRING,
partitionedBy1 DATE,
partitionedBy2 STRING
)

STORED AS PARQUETFILE

 

ALTER TABLE foo ADD PARTITION (bar='bar') " +

"LOCATION 'hdfs://nameserviceid/bar/tablelocation' " 
ALTER TABLE foo ADD PARTITION (bar='another_bar') " +

"LOCATION 'hdfs://nameserviceid/another_bar/tablelocation' "

In this case, I am getting an exception

 

partition spec doesn't contain all (3) partition columns

 

If I define all 3 partition columns and proper location, it works.

 

ALTER TABLE foo ADD PARTITION (bar='another_bar', partitionedBy1=value1, partitionedBy2=value2) " +
"LOCATION 'hdfs://nameserviceid/another_bar/tablelocation/partitionedBy1=value1/partitionedBy2=value2' " 

But there are lots of values for partitionedBy1, partitionedBy2 columns and it is hard to include all of them statically.

 

Is it possible to specify an exact location for one of partition column (bar, another_bar) and left other partition columns to be dynamic?

1 REPLY 1
Highlighted

Re: Hive - Add statically new layer of partitioning with dynamic partitioning by other columns

Master Guru
There is not a currently available way to do this with ALTER TABLE ADD PARTITION. Its implementation requires explicit definition of all partitions when providing the partition_spec [1].

Please file a feature request at https://issues.apache.org/jira/browse/HIVE for the Apache Hive team to consider feasibility of adding this in future.

In the mean time, if you are able to script out your statements, you can consider using SHOW PARTITIONS and a HDFS listing to dynamically create all the explicit statements perhaps?

[1] - https://github.com/apache/hive/blob/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/DD... (final arg to call) and https://github.com/apache/hive/blob/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/metadata...