Does Hadoop Archive both reduce the number of files and compress the size of the files or just reduce the number of files?
Wanted to know because I have a use case where it would be good to reduce the number of files but not compress them too much.
The Hadoop Archive will create a HAR file from the input directories mentioned by creating the HAR. It will reduce both
If your use case is just reducing the file count/merging small files and not compression, I would recommend having a look at the merge option. Try using the following code snippet to merge the files.
hadoop jar /usr/hdp/220.127.116.11-2950/hadoop-mapreduce/hadoop-streaming-<your version>.jar \
-Dmapred.reduce.tasks=<NUMBER OF FILES YOU WANT> \
-input "/hdfs/input/dir" \
-output "/hdfs/output/dir" \
-mapper cat \
Let know if that helps!