Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

Is there an -ignoreCrc equivalent when using getmerge?

Contributor

When copying files from HDFS to a local file system:

hdfs dfs -copyToLocal <source> <dest>

you have options -crc and -ignoreCrc to turn the checksum files on/off.

I am merging/copying out to local using

hdfs dfs -getmerge <sourceDir> <destFile>

and end up with a hidden .destFile.crc file for each destFile.

Is there an equivalent way to turn this function off, or otherwise automatically remove the .destFile.crc if the corresponding destFile is deleted (from the local file system)?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

Hello @Emily Sharpe.

There is currently no way to skip writing the CRC file when running the -getmerge command. I filed Apache JIRA HADOOP-12643 to propose an enhancement to the command that would allow skipping the write of the CRC file. In the meantime, the best option is probably to use a scripting workaround, such as the suggestion from @Neeraj Sabharwal.

View solution in original post

5 REPLIES 5

@Emily Sharpe

I don't see any option in -getmerge. I think you may want to write a shell script to remove .crc files from a particular location. Something like the following. You can run a cron to execute that

find . -type f -name '*.crc' -exec rm {} +

Contributor

Hi @Neeraj Sabharwal, than you for the script line - looks like i will be adding that in!

Explorer

Hi @Neeraj Sabharwal,

I am trying to save my output results in Spark using saveAsTextFile(""). The result of which is multiple parts (part-0000, part-00001 ...so on) along with .crc files in the output directory. Do you have any idea how can I avoid forming the .crc files?

Hello @Emily Sharpe.

There is currently no way to skip writing the CRC file when running the -getmerge command. I filed Apache JIRA HADOOP-12643 to propose an enhancement to the command that would allow skipping the write of the CRC file. In the meantime, the best option is probably to use a scripting workaround, such as the suggestion from @Neeraj Sabharwal.

View solution in original post

Contributor

Hi @Chris Nauroth thanks for the confirmation, and great to know the option has been suggested 🙂