Created 12-24-2015 04:42 PM
I am trying to concatenate smaller ORC files into larger ones using Hive's concatenate feature but am having some issues with certain partitions. I get this error:
source table default.vsatlmtd_prod_orc is partitioned but partition not found.
Namely, partitions whose day=X part has an X value of 06, 07, etc, fail but it works when the value is NOT zero padded, like 10, 11, 12 work fine. The type of day partition is "int". When I use it in where clause, I specify the day part of the partition as day=6 and that works.
0: jdbc:hive2://vmwhaddev01:10000> select device_id from vsatlmtd_prod_orc where year=2015 and month=10 and day=6 limit 5; +------------------+--+ | device_id | +------------------+--+ | DSN001000327769 | | DSN001000327769 | | DSN001000327769 | | DSN001000327769 | | DSN001000327769 | +------------------+--+
This is very non-intuitive behavior wrt the concatenate NOT working the same way. Here's the specifics on the table.
0: jdbc:hive2://vmwhaddev01:10000> show create table default.vsatlmtd_prod_orc; +--------------------------------------------------------------------+--+ | createtab_stmt | +--------------------------------------------------------------------+--+ | CREATE TABLE `default.vsatlmtd_prod_orc`( | | `device_id` string, | | `esn` int, | | `collection_start_time` bigint, | | `collection_end_time` bigint, | | `up_time` int, | | `reboot_count` int, | >>> Lots of other int/string rows | `sw_profile_name` string, | | `service_plan` string) | | PARTITIONED BY ( | | `year` int, | | `month` int, | | `day` int) | | ROW FORMAT DELIMITED | | FIELDS TERMINATED BY ',' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' | | LOCATION | | 'hdfs://vmwhaddev01:8020/apps/hive/warehouse/vsatlmtd_prod_orc' | | TBLPROPERTIES ( | | 'last_modified_by'='hdfs', | | 'last_modified_time'='1445532988', | | 'orc.compress'='SNAPPY', | | 'transient_lastDdlTime'='1445532988') | 0: jdbc:hive2://vmwhaddev01:10000> show partitions vsatlmtd_prod_orc; +----------------------------+--+ | partition | +----------------------------+--+ | year=2015/month=1/day=1 | | year=2015/month=10/day=06 | | year=2015/month=10/day=07 | | year=2015/month=10/day=08 | | year=2015/month=10/day=09 | | year=2015/month=10/day=10 | | year=2015/month=10/day=11 | | year=2015/month=10/day=12 | +----------------------------+--+ 8 rows selected (0.084 seconds) 0: jdbc:hive2://vmwhaddev01:10000> alter table vsatlmtd_prod_orc partition (year=2015, month=10, day=6) concatenate; Error: Error while compiling statement: FAILED: SemanticException org.apache.hadoop.hive.ql.parse.SemanticException: source table default.vsatlmtd_prod_orc is partitioned but partition not found. (state=42000,code=40000)
But, rows where the day part of the partition key is NOT zero-padded work, like day=10?
0: jdbc:hive2://vmwhaddev01:10000> alter table vsatlmtd_prod_orc partition (year=2015, month=10, day=10) concatenate; INFO : Tez session hasn't been created yet. Opening session INFO : INFO : Status: Running (Executing on YARN cluster with App id application_1445910583372_0119) INFO : File Merge: -/- INFO : File Merge: 0/4 INFO : File Merge: 0(+1)/4 INFO : File Merge: 0(+2)/4 INFO : File Merge: 0(+3)/4 INFO : File Merge: 1(+2)/4 INFO : File Merge: 2(+1)/4 INFO : File Merge: 2(+1)/4 INFO : File Merge: 3(+0)/4 INFO : File Merge: 4/4 INFO : Loading data to table default.vsatlmtd_prod_orc partition (year=2015, month=10, day=10) from hdfs://vmwhaddev01:8020/apps/hive/warehouse/vsatlmtd_prod_orc/year=2015/month=10/day=10/.hive-staging_hive_2015-12-24_10-50-17_360_8089986534053843520-8/-ext-10000 INFO : Partition default.vsatlmtd_prod_orc{year=2015, month=10, day=10} stats: [numFiles=4, numRows=0, totalSize=31032689, rawDataSize=0]
Note that this table's partitions were auto-created when the data was inserted due to these settings:
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;
The partitions show up like this when looking at the actual paths in HDFS:
/apps/hive/warehouse/vsatlmtd_prod/year=2015/month=10/day=06/000000_0
Hmmm, I think I just solved my own problem...So that made think this was a disconnect between how Hive interprets partitions in a 'where' clause as opposed to how it interprets them in the partition statement. To test this theory, I did this:
alter table vsatlmtd_prod_orc partition (year=2015, month=10, day='06') rename to partition (year=2015, month=10, day=6);
And when I ran the concatenate command on this partition, it now worked with day=6. Problem solved. I need to NOT create my partition values using "MM" or "dd" for time formats and instead just use "M" or "d" to avoid the zero padding.
Created 12-24-2015 04:52 PM
So, when you read the problem you will see that I noted that I solved the problem in course of typing up and thinking through the question. But, I decided to post it anyway because, (a) it might help someone else and (b) IMHO, I think Hive should treat partition key values that are typed as some number type (in my case, year, month, and day are all 'int' types) as actual numbers. Integer.ParseInt("06") returns 6, as expected. In my example, a where clause like, "...where year=2015 and month=10 and day=6..." works as expected and figures out the right partition even though the partition path on HDFS is "year=2015/month=10/day=06" (note zero padded day). But, the concatenate operation fails to recognize the partition. To me, this is inconsistent and non-intuitive but I would appreciate hearing something from an expert related to my comment.
One other point, even this does not work:
alter table vsatlmtd_prod_orc partition (year=2015, month=10, day='06') concatenate;
So, I could find no way to specify this partition unless I renamed it to day=6. That smells fishy to me.
Created 12-24-2015 04:52 PM
So, when you read the problem you will see that I noted that I solved the problem in course of typing up and thinking through the question. But, I decided to post it anyway because, (a) it might help someone else and (b) IMHO, I think Hive should treat partition key values that are typed as some number type (in my case, year, month, and day are all 'int' types) as actual numbers. Integer.ParseInt("06") returns 6, as expected. In my example, a where clause like, "...where year=2015 and month=10 and day=6..." works as expected and figures out the right partition even though the partition path on HDFS is "year=2015/month=10/day=06" (note zero padded day). But, the concatenate operation fails to recognize the partition. To me, this is inconsistent and non-intuitive but I would appreciate hearing something from an expert related to my comment.
One other point, even this does not work:
alter table vsatlmtd_prod_orc partition (year=2015, month=10, day='06') concatenate;
So, I could find no way to specify this partition unless I renamed it to day=6. That smells fishy to me.
Created 12-25-2015 04:49 PM
@Mark Petronic So ..the issue is with the partition starts with 0 ..correct? Your solution is to rename the partition starting with 0 to a non zero integer i,e rename 06 to 6
I am checking the bug database to see if it was reported or if it's an expected behavior. @gopal @Alan Gates
Created 12-25-2015 06:53 PM
I was creating the partitions using Joda time formater with "MM" and "dd" for the month and day. That will zero-pad the single digit month and day numbers. I will just switch to using "M" and "d" format instead. So, yes, removing the leading zero solves the issue.
Created 12-25-2015 06:54 PM
@Mark Petronic Good to know and thanks for sharing the details in the initial post as it helps a lot to understand the issue. Happy Holidays!
Created 12-28-2015 06:46 PM
@Neeraj Sabharwal this isn't expected behavior, but it's not surprising. There are many issues with non-string partition types. If this isn't in the bug base you should file a ticket for it. I suspect if affects all alter table partition clauses.
Created on 08-25-2016 07:13 PM - edited 08-19-2019 05:23 AM
It is not working for me. Can you let me know if i'm doing anything wrong??
test4 is a table partitioned on lname and is ORC format. the partition I'm trying to merge has just 2 small files.
ALTER TABLE test4 PARTITION (lname='vr') CONCATENATE;
Created 12-25-2015 04:43 PM