Created 12-14-2015 04:28 AM
When copying files from HDFS to a local file system:
hdfs dfs -copyToLocal <source> <dest>
you have options -crc and -ignoreCrc to turn the checksum files on/off.
I am merging/copying out to local using
hdfs dfs -getmerge <sourceDir> <destFile>
and end up with a hidden .destFile.crc file for each destFile.
Is there an equivalent way to turn this function off, or otherwise automatically remove the .destFile.crc if the corresponding destFile is deleted (from the local file system)?
Thank you!
Created 12-15-2015 06:27 PM
Hello @Emily Sharpe.
There is currently no way to skip writing the CRC file when running the -getmerge command. I filed Apache JIRA HADOOP-12643 to propose an enhancement to the command that would allow skipping the write of the CRC file. In the meantime, the best option is probably to use a scripting workaround, such as the suggestion from @Neeraj Sabharwal.
Created 12-14-2015 11:47 AM
I don't see any option in -getmerge. I think you may want to write a shell script to remove .crc files from a particular location. Something like the following. You can run a cron to execute that
find . -type f -name '*.crc' -exec rm {} +
Created 12-16-2015 12:18 AM
Hi @Neeraj Sabharwal, than you for the script line - looks like i will be adding that in!
Created 05-17-2016 01:30 PM
I am trying to save my output results in Spark using saveAsTextFile(""). The result of which is multiple parts (part-0000, part-00001 ...so on) along with .crc files in the output directory. Do you have any idea how can I avoid forming the .crc files?
Created 12-15-2015 06:27 PM
Hello @Emily Sharpe.
There is currently no way to skip writing the CRC file when running the -getmerge command. I filed Apache JIRA HADOOP-12643 to propose an enhancement to the command that would allow skipping the write of the CRC file. In the meantime, the best option is probably to use a scripting workaround, such as the suggestion from @Neeraj Sabharwal.
Created 12-16-2015 12:21 AM
Hi @Chris Nauroth thanks for the confirmation, and great to know the option has been suggested 🙂