- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Is there an -ignoreCrc equivalent when using getmerge?
- Labels:
-
Apache Hadoop
Created ‎12-14-2015 04:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When copying files from HDFS to a local file system:
hdfs dfs -copyToLocal <source> <dest>
you have options -crc and -ignoreCrc to turn the checksum files on/off.
I am merging/copying out to local using
hdfs dfs -getmerge <sourceDir> <destFile>
and end up with a hidden .destFile.crc file for each destFile.
Is there an equivalent way to turn this function off, or otherwise automatically remove the .destFile.crc if the corresponding destFile is deleted (from the local file system)?
Thank you!
Created ‎12-15-2015 06:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Emily Sharpe.
There is currently no way to skip writing the CRC file when running the -getmerge command. I filed Apache JIRA HADOOP-12643 to propose an enhancement to the command that would allow skipping the write of the CRC file. In the meantime, the best option is probably to use a scripting workaround, such as the suggestion from @Neeraj Sabharwal.
Created ‎12-14-2015 11:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't see any option in -getmerge. I think you may want to write a shell script to remove .crc files from a particular location. Something like the following. You can run a cron to execute that
find . -type f -name '*.crc' -exec rm {} +
Created ‎12-16-2015 12:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Neeraj Sabharwal, than you for the script line - looks like i will be adding that in!
Created ‎05-17-2016 01:30 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to save my output results in Spark using saveAsTextFile(""). The result of which is multiple parts (part-0000, part-00001 ...so on) along with .crc files in the output directory. Do you have any idea how can I avoid forming the .crc files?
Created ‎12-15-2015 06:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Emily Sharpe.
There is currently no way to skip writing the CRC file when running the -getmerge command. I filed Apache JIRA HADOOP-12643 to propose an enhancement to the command that would allow skipping the write of the CRC file. In the meantime, the best option is probably to use a scripting workaround, such as the suggestion from @Neeraj Sabharwal.
Created ‎12-16-2015 12:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Chris Nauroth thanks for the confirmation, and great to know the option has been suggested 🙂
