I am receiving around 300k-400k messages per day in NiFi version Nifi-18.104.22.168.0.0.0-453. The messages are coming as a Json and the goal is to put them into a hive table in near real time. I have built the following flow which is attached as an image in this post. The flow is working fine and doing everything like it is supposed to. The issue is that is not writing to putHiveStreaming fast enough and as a result messages start getting queued up in the flow. I have been playing around with the Transaction per Batch and Records per Transaction configuration properties from the putHiveStreaming but I seem to be unable to get the putHiveStreaming processor to write faster but it does not seem to be working and we are getting messages a lot faster than what we are writing. Is there a way to configure the putHiveStreaming processor so that it can handle this type of load of 300k-400k messages per day at around 3 to 4 messages per second fast enough? Any insight on this issue will be deeply appreciated.
I would check to see if there are lot of delta files being created. Use the ReplaceText or any similar processor and run a major compaction on the table every now and then. That should speed it up a bit.
Hi @Adda Fuentes - we are facing the same issue. Were you able to get this to run quickly ? In our case also we are seeing constant queuing as the rate at which records are consumed is much slower than the rate of consumption at the Put Hive Streaming Processor.
the ReplaceText processor with PutHiveQL together will pass the command for major or minor compaction. you would do the compact major command in the text of the replace text processor and then have that stream into PutHiveQL.