Need Script for to automate Distcp job- to migrate data from Production to QA Daily




I want copy data dalily from production to QA. Requiest you please provide me script when namenode changes in production and QA.


Are you using Cloudera Manager? if so, do you have same version of CM in your prod and qa then you can try the below


1. CM -> Backup menu -> Peers (add peer) - one time 
2. CM -> Bakcup menu -> Replication schedule -> Create schedule (as many as you want)

Hi @public Do you have HA nameNode?


Are you using cloudera? which version?


@saranvisa I assume these options available only in the Enterprise version.


I will help and provide you a script


Yes we have HA and we are using CDH 5.12.1




I follwed your steps. Its getting below error


$> dr/ ["-bandwidth","100","-i","-m","20","-prbugpa","-skipAclErr","-skipcrccheck","-skiplistingcrccheck","-update","-proxyuser","guc","-log","/user/PROXY_USER_PLACEHOLDER/.cm/distcp/2018-11-28_16856","-sourceconf","source-client-conf","-sourceprincipal","hdfs/","-sourcetktcache","source.tgt","-useSnapshots","distcp-39-1373113265","-ignoreSnapshotFailures","-strategy","dynamic","-filters","exclusion-filter.list","-scheduleId","39","-scheduleName","Teradata","/data/analysis/teradata/","/data/analysis/chekdata"]


Current working directory: /run/cloudera-scm-agent/process/5637-hdfs-distcp-7884b73f
Launching one-off process: /usr/lib64/cmf/service/dr/ -bandwidth 100 -i -m 20 -prbugpa -skipAclErr -update -proxyuser guc -log /user/PROXY_USER_PLACEHOLDER/.cm/distcp/2018-11-28_16855 -sourceconf source-client-conf -sourceprincipal hdfs/ -sourcetktcache source.tgt -useSnapshots distcp-38--383162628 -ignoreSnapshotFailures -strategy dynamic -filters exclusion-filter.list -scheduleId 38 -scheduleName Test /data/analysis/chekdata/check_summary_bgem_dgem /data/analysis/chekdata/
Wed Nov 28 06:08:09 CST 2018
Running on: (
using /usr/java/jdk1.8.0_144 as JAVA_HOME
using 5 as CDH_VERSION
using /run/cloudera-scm-agent/process/5637-hdfs-distcp-7884b73f as CONF_DIR
using hdfs/ as Kerberos principal
using /run/cloudera-scm-agent/process/5637-hdfs-distcp-7884b73f/krb5cc_993 as Kerberos ticket cache
using /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop-mapreduce as CDH_MR2_HOME