Created 11-20-2018 09:50 AM
Datas which are inserted into Hive by INSERT INTO command are gotten minor compaction.
But the data inserted into Hive using NiFi's PutHive3Streaming aren't gotten minor compaction even if there is enough deltas.
Is it possible to make minor compaction(NiFi) work?
Created 11-20-2018 02:22 PM
If you want to trigger minor/major compactions from NiFi then feed the Success relationship PutHiveStreaming processor to Replace Text process and configure ReplaceText processor with below configs:
replacement strategy
always replace
and replacement value as
alter table <db_name>.<table_name> compact 'minor';
Then using PutHiveQL processor execute the minor compaction.
Flow:
--other processors --> PutHivestreaming --> ReplaceText processor --> PutHiveQL
By following this way we are initializing minor compaction from NiFi.
Take a look into this SupportKB related to Minor Compactions are not working in Hive, set the recommended global configs to make minor compactions work.
Created 11-20-2018 02:22 PM
If you want to trigger minor/major compactions from NiFi then feed the Success relationship PutHiveStreaming processor to Replace Text process and configure ReplaceText processor with below configs:
replacement strategy
always replace
and replacement value as
alter table <db_name>.<table_name> compact 'minor';
Then using PutHiveQL processor execute the minor compaction.
Flow:
--other processors --> PutHivestreaming --> ReplaceText processor --> PutHiveQL
By following this way we are initializing minor compaction from NiFi.
Take a look into this SupportKB related to Minor Compactions are not working in Hive, set the recommended global configs to make minor compactions work.
Created 11-21-2018 04:37 AM
Thank you, I'll try that.
The link isn't working. You meant this page?
https://community.hortonworks.com/content/supportkb/193756/automatic-minor-compaction-on-hive-is-not...
And, do you have any idea to trigger ReplaceText and PutHiveQL only after some flowfile passed PutHiveStreaming?
I think invoking minor compaction for each flowfile is too much when a lot of flowfile comes.
Created 11-21-2018 02:03 PM
Yes, i meant https://community.hortonworks.com/content/supportkb/193756/automatic-minor-compaction-on-hive-is-not... this page.
Use merge content processor after PutHiveStreaming and configure the processor to wait for minimum 10 flowfiles (or some other number) and merge them into one then feed the merged relation to ReplaceText processor, by using merge content processor we are going to wait for atleast 10 flowfiles and then triggering minor compaction.
Flow:
--other processors -->PutHivestreaming --> MergeContent -->ReplaceText processor -->PutHiveQL
Created 11-26-2018 12:02 AM