cases where changing hadoop block size is not recommended ?

Expert Contributor

Hi Team,

consider a hadoop cluster with default block size of 64Mb , we have a case wherein we would like to make use of hadoop for storing historical data and retrieving it as per need

historical data would be in form of archive containing many small files (millions) , so thats the reason we would like to reduce default block size in hadoop to 32MB ?

I also understand that changing default size to 32MB may adversely affect if we plan to use that cluster for applications which ,

store files which are huge in size ,

so can anyone advise what to do in such situations


Expert Contributor

Hi thanks for your reply

I appreciate your inputs

Please advice on how to store and read archive data from hive

while storing data in hive , should i save it as .har in hdfs ?

our application generatee small size xml files which are stored on NAS and XML associated metadata in DB .

plan is to extract metadata from DB into 1 file and compress xml into 1 huge archive say 10GB , suppose each archive is 10GB and data is 3 months old

so i wanted to know best solution for storing and accessing this archived data in hadoop --> HDFS/HIVE/Hbase

Please advise what do you think will be the better approach for reading this archived data

suppose i am storing this archived data in hive so how do i retrieve this archived data

Please guide me for storing archived data in hive

also guide for Retrieving/reading archived data from hive when needed