Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cloudera hive replication one of job long running due to small files issue.

Highlighted

Cloudera hive replication one of job long running due to small files issue.

We have scheduled hive replication jobs for all hive databases. Found one of the job taking long time due to small files issue. can i go for getmerge concepts for this issue.

 

implementing getmerge  whether it will impact for users? If any update/insert to small file how would it updates to getmerge files? It will solve distcp command for replication job?

 

OR any features in cloudera for small files issue for replication job?