Member since
03-27-2018
5
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1742 | 02-19-2019 04:32 PM |
02-19-2019
04:32 PM
We solved this issue by using dynamic partitioning instead of specifying the "Partition values" field.
... View more
11-20-2018
11:23 PM
Cluster information: HDP 3.0, NiFi 1.7.0, 3 NiFi nodes
Context: We are using NiFi PutHive3Streaming processor to ingest data into a partitioned ORC Hive table (5 level partitioning). The input FlowFile is in JSON format. Here is the processor configuration:
Problem: The data ingestion works fine (data is written to HDFS in the right path and readable from Hive requests), but when the Hive compactor is triggered, it looks for wrong partitions (all the partitions values are lowercased when some should be uppercase). For example, we have the partition dev.table1.partition1=MH/partition2=P2/partition3=0025/year=2018/month=01, and we get the following logs in hivemetastore.log:
2018-11-20T09:42:34,701 INFO [Thread-10]: compactor.Initiator (Initiator.java:run(97)) - Checking to see if we should compact dev.table1.partition1=mh/partition2=p2/partition3=0025/year=2018/month=01
2018-11-20T09:42:34,710 INFO [Thread-10]: compactor.Initiator (Initiator.java:run(142)) - Can't find partition dev.table1.partition1=mh/partition2=p2/partition3=0025/year=2018/month=01, assuming it has been dropped and moving on.
It looks like Hive has wrong information about the table's partitions, yet when running a DESCRIBE FORMATTED on the partition:
With lowercased partitions' values:
describe formatted dev.table1(partition1='mh',partition2='p2',partition3='0025',year=2018,month=01);
Error: Error while compiling statement: FAILED: SemanticException [Error 10006]: Partition not found {partition1=mh, partition2=p2, partition3=0025, year=2018, month=01} (state=42000,code=10006)
With right partitions' values:
describe formatted dev.table1 partition(partition1='MH',partition2='P2',partition3='0025',year=2018,month=01);
[...]
| # Detailed Partition Information | NULL | NULL |
| Partition Value: | [MH, P2, 0025, 2019, 01] | NULL |
| Database: | dev | NULL |
| Table: | table1 | NULL |
| CreateTime: | Wed Nov 07 15:16:15 CET 2018 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Location: | hdfs://nnhdfs/warehouse/tablespace/managed/hive/dev.db/table1/partition1=MH/partition2=P2/partition3=0025/year=2019/month=01 | NULL |
| Partition Parameters: | NULL | NULL |
| | transient_lastDdlTime | 1541600175 |
Here we see that Hive has the good metadata about the partition.
We don't get why when looking for partitions to compact the partitions values are turned to lowercase. Is this a normal behavior, and if not do you have any idea where this problem comes from?
Our temporary workaround is to use lowercase partition values, which is not very satisfying.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
03-27-2018
12:57 PM
1 Kudo
@ke chen Starting from Ambari 2.6, the way of specifying repo versions has changed as you can see here. You have to register a VDF file like this one with the version_definitions API (in the VDF file, replace the repository-info.os.repo.baseurl by your local repo url). curl -u user:pwd -H "X-Requested-By: ambari" -X POST
http://ambari.server:8080/api/v1/version_definitions \
-d '{
"VersionDefinition": {
"version_url": "your.vdf.file.xml"
}
}' You will get an id for the version and you will have to reference it when creating the cluster : {
"blueprint": "<blueprint name>",
"repository_version_id": 1,
"default_password": "password",
"host_groups": [ ... ]
} More details in the release note: https://docs.hortonworks.com/HDPDocuments/Ambari-2.6.0.0/bk_ambari-release-notes/content/ambari_relnotes-2.6.0.0-behavioral-changes.html
... View more