Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive partition - partition column as part of the data?

avatar
Rising Star

i have a table with two string columns and one datetime column (which is also defined as string datatype).

I want to partition the table on monthly basis ie month(the datetime column).

So I did the below :

create table test_part_tbl (id string, cd string, dttm string) partitioned by (mth string);

insert into test_part_tbl partition(mth) select id, cd, create_dt, month(create_dt) from real_table;

hive> select * from test_part_tbl ;

OK

test_part_tbl.id test_part_tbl.cd test_part_tbl.dttm test_part_tbl.mth

id1 cd1 2018-10-24 10

id2 cd1 2018-10-24 10

Time taken: 0.13 seconds, Fetched: 2 row(s)

So the month ie "10" is actually appearing as part of the table data. Is that correct?

Is it possible to partition the table as above and not have the partition column/value as part of the table data? ie when querying can't I use "month(dttm)" and search based on month?

Appreciate the insights.

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
8 REPLIES 8

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Rising Star

The below hive performance parameter - is it usually set within a map reduce program to be set at the time of execution :

SET hive.exec.parallel=true

Or can it be set at the global level in Ambari?

Appreciate the feedback.

avatar
Explorer

login to hive server and run the above command

avatar
Master Guru

@n c

Once you white list the param in ambari then you are able to set the parameter in hive cli.

avatar
Rising Star

actually I am working with cloudera now and i dont see hive.exec.parallel as a configurable option in cloudera manager.

avatar
Rising Star

if i partition a table by year - can i further bucket it by month?

so the idea is the year will be the top level and months will be at a level beneath it.

so the directory structure would be :

2018 -> 1, 2, 3 ... 12

2019 -> 1, 2, 3 ... 12

Is this what bucketing is about? Or should i be doing this someway with partitions itself?

Appreciate the insights.

avatar
Rising Star

I tried this but wouldn't work :

create table test_part_bkt_tbl (id string, cd string, dttm string) partitioned by (yr string) clustered by (month(dttm)) into 12 buckets;

avatar
Rising Star

ok, this can be done simply as : partitioned by (yr string, mth string).

tks.