I'm actually working on a Cloudera test cluster. I want to reinstall the plateform and I need to save data currently on it.
My question is: How can i do this backup?
thanks for your reaction
There are multiple options, you can choose any suitable (some time combination of more than one)
a. Not sure you have two cluster and move/copy data between them (or)
b. Just copy from your cluster to local.
1. distcp: to copy data between two clusters. (Suitable for option a, I never tried this for option b)
2. copyToLocal: Suitable for option b
3. Export hive table: Suitable for option b. You can export your hive table to HDFS path and apply copyToLocal. Note: I tried this for non-partitioned tables, please export and import the same data back to test the partitioned tables
hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>
This command can do the job but I have a question: where must i execute the command to be sure to copy all the data and not just et party? datanode or namenode
I don't think any single command available to copy all the cluster data to local. You can copy the parent directory belongs to the corresponding service and zip it in local (if you have enough space in local)
it is neither datanode nor namenode (it is to your local)