11-26-2018 05:58 AM - last edited on 11-26-2018 06:41 AM by cjervis
We have scheduled hive replication jobs for all hive databases. Found one of the job taking long time due to small files issue. can i go for getmerge concepts for this issue.
implementing getmerge whether it will impact for users? If any update/insert to small file how would it updates to getmerge files? It will solve distcp command for replication job?
OR any features in cloudera for small files issue for replication job?