Support Questions

Find answers, ask questions, and share your expertise

NiFi PutHive3Streaming and Hive's minor compaction

avatar

Datas which are inserted into Hive by INSERT INTO command are gotten minor compaction.
But the data inserted into Hive using NiFi's PutHive3Streaming aren't gotten minor compaction even if there is enough deltas.
Is it possible to make minor compaction(NiFi) work?

1 ACCEPTED SOLUTION

avatar
Master Guru
@Kei Miyauchi

If you want to trigger minor/major compactions from NiFi then feed the Success relationship PutHiveStreaming processor to Replace Text process and configure ReplaceText processor with below configs:

replacement strategy

always replace  

and replacement value as

alter table <db_name>.<table_name> compact 'minor';

Then using PutHiveQL processor execute the minor compaction.

Flow:

--other processors
--> PutHivestreaming 
--> ReplaceText processor
--> PutHiveQL

By following this way we are initializing minor compaction from NiFi.

Take a look into this SupportKB related to Minor Compactions are not working in Hive, set the recommended global configs to make minor compactions work.

View solution in original post

4 REPLIES 4

avatar
Master Guru
@Kei Miyauchi

If you want to trigger minor/major compactions from NiFi then feed the Success relationship PutHiveStreaming processor to Replace Text process and configure ReplaceText processor with below configs:

replacement strategy

always replace  

and replacement value as

alter table <db_name>.<table_name> compact 'minor';

Then using PutHiveQL processor execute the minor compaction.

Flow:

--other processors
--> PutHivestreaming 
--> ReplaceText processor
--> PutHiveQL

By following this way we are initializing minor compaction from NiFi.

Take a look into this SupportKB related to Minor Compactions are not working in Hive, set the recommended global configs to make minor compactions work.

avatar

@Shu

Thank you, I'll try that.
The link isn't working. You meant this page?
https://community.hortonworks.com/content/supportkb/193756/automatic-minor-compaction-on-hive-is-not...

And, do you have any idea to trigger ReplaceText and PutHiveQL only after some flowfile passed PutHiveStreaming?
I think invoking minor compaction for each flowfile is too much when a lot of flowfile comes.

avatar
Master Guru

@Kei Miyauchi

Yes, i meant https://community.hortonworks.com/content/supportkb/193756/automatic-minor-compaction-on-hive-is-not... this page.

Use merge content processor after PutHiveStreaming and configure the processor to wait for minimum 10 flowfiles (or some other number) and merge them into one then feed the merged relation to ReplaceText processor, by using merge content processor we are going to wait for atleast 10 flowfiles and then triggering minor compaction.

Flow:

--other processors
-->PutHivestreaming
--> MergeContent
-->ReplaceText processor
-->PutHiveQL

avatar

@Shu

That works!

Thank you for all you replies.