Community Articles

Find and share helpful community-sourced technical articles.
Welcome to the upgraded Community! Read this blog to see What’s New!
Labels (2)
Expert Contributor

With the arrival of the Hive3Streaming processors, performance has never been better between NiFi and Hive3. Below, we'll be taking a look at the PutHive3Streaming (separate from the PutHiveStreaming processor) processor and how it can fit into a basic Change Data Capture workflow. We will be performing only inserts on the source Hive table and then carrying over those inserts into a destination Hive table. Updates are also supported through this process with the addition of a 'last_updated_datetime' timestamp, but that is out of scope for this article. This is meant to simulate copying data from one HDP cluster to another - we're using the same cluster as the source and destination here however.

Here is the completed flow. Take note that most of this flow is just keeping track of the latest ID that we've seen, so that we can pull that back out of HDFS periodically and query Hive for records that were added beyond that latest ID.


These last two processors, SelectHive3QL and PutHive3Streaming, are the ones doing the heavy lifting. They are the ones first getting the data from the source table (based on the pre-determined latest ID) and then inserting that retrieved data into the destination Hive table.


Here's the configuration for the SelectHive3QL processor. Note the ${curr_id} variable used in the HiveQL Select Query field. That ensures our query will be dynamic.


This is the configuration for the PutHive3Streaming processor. Nothing special here - we've configured the Hive Configuration Resources with the hive-site.xml file and used the Avro format (above) to retrieve data and (below) to write it back out.


Here's the state of the table as we first check the destination table:


And then insert a record into the source table:


Here's the new state of the table:


Here is the full NiFi flow:


In conclusion, we've shown one of the ways you can utilize the new PutHive3Streaming processors to perform quick inserts into Hive and at a broader level perform Change Data Capture.

Version history
Last update:
‎08-17-2019 05:45 AM
Updated by: