The Hadoop Archive will create a HAR file from the input directories mentioned by creating the HAR. It will reduce both
- Number of files
- Size of data
If your use case is just reducing the file count/merging small files and not compression, I would recommend having a look at the merge option. Try using the following code snippet to merge the files.
hadoop jar /usr/hdp/220.127.116.11-2950/hadoop-mapreduce/hadoop-streaming-<your version>.jar \
-Dmapred.reduce.tasks=<NUMBER OF FILES YOU WANT> \
-input "/hdfs/input/dir" \
-output "/hdfs/output/dir" \
-mapper cat \
Let know if that helps!