Posts: 12
Registered: ‎08-04-2017

Cloudera hive replication one of job long running due to small files issue.

[ Edited ]

We have scheduled hive replication jobs for all hive databases. Found one of the job taking long time due to small files issue. can i go for getmerge concepts for this issue.


implementing getmerge  whether it will impact for users? If any update/insert to small file how would it updates to getmerge files? It will solve distcp command for replication job?


OR any features in cloudera for small files issue for replication job?