Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

NiFi PutHive3Streaming - Hive compactor looking for wrong partitions


Cluster information: HDP 3.0, NiFi 1.7.0, 3 NiFi nodes

Context: We are using NiFi PutHive3Streaming processor to ingest data into a partitioned ORC Hive table (5 level partitioning). The input FlowFile is in JSON format. Here is the processor configuration:


Problem: The data ingestion works fine (data is written to HDFS in the right path and readable from Hive requests), but when the Hive compactor is triggered, it looks for wrong partitions (all the partitions values are lowercased when some should be uppercase). For example, we have the partition dev.table1.partition1=MH/partition2=P2/partition3=0025/year=2018/month=01, and we get the following logs in hivemetastore.log:

2018-11-20T09:42:34,701 INFO  [Thread-10]: compactor.Initiator ( - Checking to see if we should compact dev.table1.partition1=mh/partition2=p2/partition3=0025/year=2018/month=01
2018-11-20T09:42:34,710 INFO  [Thread-10]: compactor.Initiator ( - Can't find partition dev.table1.partition1=mh/partition2=p2/partition3=0025/year=2018/month=01, assuming it has been dropped and moving on.

It looks like Hive has wrong information about the table's partitions, yet when running a DESCRIBE FORMATTED on the partition:

  • With lowercased partitions' values:
describe formatted dev.table1(partition1='mh',partition2='p2',partition3='0025',year=2018,month=01);
Error: Error while compiling statement: FAILED: SemanticException [Error 10006]: Partition not found {partition1=mh, partition2=p2, partition3=0025, year=2018, month=01} (state=42000,code=10006)
  • With right partitions' values:
describe formatted dev.table1 partition(partition1='MH',partition2='P2',partition3='0025',year=2018,month=01);
| # Detailed Partition Information  | NULL                         | NULL             |
| Partition Value:                  | [MH, P2, 0025, 2019, 01]     | NULL             |
| Database:                         | dev                          | NULL             |
| Table:                            | table1                       | NULL             |
| CreateTime:                       | Wed Nov 07 15:16:15 CET 2018 | NULL             |
| LastAccessTime:                   | UNKNOWN                      | NULL             |
| Location:                         | hdfs://nnhdfs/warehouse/tablespace/managed/hive/dev.db/table1/partition1=MH/partition2=P2/partition3=0025/year=2019/month=01 | NULL                  |
| Partition Parameters:             | NULL                         | NULL             |
|                                   | transient_lastDdlTime        | 1541600175       |

Here we see that Hive has the good metadata about the partition.

We don't get why when looking for partitions to compact the partitions values are turned to lowercase. Is this a normal behavior, and if not do you have any idea where this problem comes from?

Our temporary workaround is to use lowercase partition values, which is not very satisfying.



We solved this issue by using dynamic partitioning instead of specifying the "Partition values" field.

View solution in original post



We solved this issue by using dynamic partitioning instead of specifying the "Partition values" field.