Exporting and importing data between different layers of environment like production, QA and development is a recurring task.
Due to security considerations, this environments cannot talk to each other. Hence we are using Amazon S3 storage as an intermediate storage point for transferring data seamlessly across environments. The automation of this task is expected to save close to 4 hours of manual intervention per occurrence.
The code can be re-used for disaster recovery automation.
Note: The name of the configuration files can be different for different S3 locations. This can be passed to the script. But it needs to be in conf folder under the /root/scripts/dataCopy directory.
Usage:
Scenario1: For exporting database from cluster1 to cluster2 Example database name: testdb
In cluster1:
sudo su root
cd /root/scripts/dataCopy/
./datamove.sh export testdb db conf_datamove_devs3.conf
After above execution finishes:
In cluster2:
sudo su root
cd /root/scripts/dataCopy/
./datamove.sh import testraj db conf_datamove_devs3.conf
Scenario 2: For exporting HDFS data (directory) from cluster1 to cluster2
Example directory name: /tmp/tomcatLog
In cluster1:
sudo su root
cd /root/scripts/dataCopy/
./datamove.sh export /tmp/tomcatLog dir conf_datamove_devs3.conf
After above execution finishes:
In cluster2:
sudo su root
cd /root/scripts/dataCopy/
./datamove.sh import /tmp/tomcatLog dir conf_datamove_devs3.conf
Note: The script can be run in background (nohup &) and the logs are stored inside the folder structure with database or directory name with timestamp.