Member since
12-25-2019
9
Posts
0
Kudos Received
0
Solutions
01-28-2020
03:39 AM
1 Kudo
Best way is to join your nodes using SSSD service it will solve users directory creation problem + group mapping.
... View more
01-19-2020
08:07 AM
How much data did you delete ? Did checkpoint happen after you deleted data ? Also please check if any snapshots are present ? HDFS CLI "du" output not only include normal files but also includes the files that have been deleted and exist in snapshots (which is true in terms of real resource consumption). Please check the output using -x flag which excludes snapshot from calculation. hdfs dfs -du -x -s -h /path
... View more
12-25-2019
09:44 AM
@kiranpune DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery and reporting. It expands a list of files and directories into the input to map tasks, each of which will copy a partition of the files specified in the source list that basic description. But one can use different command-line options when running DISTCP see the official dictcp documentation below are a few options for your different use cases. OPTIONS -append: Incremental copy of the file with the same name but different length -update: Overwrite if source and destination differ in size, block size, or checksum -overwrite: Overwrite destination -delete: Delete the files existing in the destination but not in the source I think you can schedule or script a daily copy
... View more