Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Is there an -ignoreCrc equivalent when using getmerge?

avatar
Rising Star

When copying files from HDFS to a local file system:

hdfs dfs -copyToLocal <source> <dest>

you have options -crc and -ignoreCrc to turn the checksum files on/off.

I am merging/copying out to local using

hdfs dfs -getmerge <sourceDir> <destFile>

and end up with a hidden .destFile.crc file for each destFile.

Is there an equivalent way to turn this function off, or otherwise automatically remove the .destFile.crc if the corresponding destFile is deleted (from the local file system)?

Thank you!

1 ACCEPTED SOLUTION

avatar

Hello @Emily Sharpe.

There is currently no way to skip writing the CRC file when running the -getmerge command. I filed Apache JIRA HADOOP-12643 to propose an enhancement to the command that would allow skipping the write of the CRC file. In the meantime, the best option is probably to use a scripting workaround, such as the suggestion from @Neeraj Sabharwal.

View solution in original post

5 REPLIES 5

avatar
Master Mentor

@Emily Sharpe

I don't see any option in -getmerge. I think you may want to write a shell script to remove .crc files from a particular location. Something like the following. You can run a cron to execute that

find . -type f -name '*.crc' -exec rm {} +

avatar
Rising Star

Hi @Neeraj Sabharwal, than you for the script line - looks like i will be adding that in!

avatar
Contributor

Hi @Neeraj Sabharwal,

I am trying to save my output results in Spark using saveAsTextFile(""). The result of which is multiple parts (part-0000, part-00001 ...so on) along with .crc files in the output directory. Do you have any idea how can I avoid forming the .crc files?

avatar

Hello @Emily Sharpe.

There is currently no way to skip writing the CRC file when running the -getmerge command. I filed Apache JIRA HADOOP-12643 to propose an enhancement to the command that would allow skipping the write of the CRC file. In the meantime, the best option is probably to use a scripting workaround, such as the suggestion from @Neeraj Sabharwal.

avatar
Rising Star

Hi @Chris Nauroth thanks for the confirmation, and great to know the option has been suggested 🙂