Created on 10-29-2018 09:16 AM - edited 08-17-2019 04:57 PM
We have uploaded 9 gb of data in HDFS and we have configured 3 nodes and nodes block size as default is 128MB,as we know hadoop replicate data in 3 nodes now in this case if we have uploaded 9GB of data it should comsume 9 X 3 GB = 27GB
however what we can see in the below attached screenshot that it is taking 27GB in each datanode. Can someone please help to understand what went wrong.
Created 10-29-2018 02:28 PM
I believe there are other files along with your 9GB file and by coincidence the other files constitute 18GB of data.
Other files constitute of many component libraries, ambari data, user data, tmp data.
Run the below command to find which files are taking size :
hadoop fs -du -s -h /*
Go further by putting path in place of * till you find which other files are present which add upto 18G
Created 10-29-2018 02:28 PM
I believe there are other files along with your 9GB file and by coincidence the other files constitute 18GB of data.
Other files constitute of many component libraries, ambari data, user data, tmp data.
Run the below command to find which files are taking size :
hadoop fs -du -s -h /*
Go further by putting path in place of * till you find which other files are present which add upto 18G
Created 10-29-2018 03:41 PM
@Soumitra SulavThank you for your help.. and yes sorry other files like libraries, ambari data, user data, tmp data will be taking so much off data.
actually there were some preproccesed data stored for which we were unaware sorry and still thank you
Created 10-29-2018 04:21 PM
If it worked for you, please take a moment to login and "Accept" the answer.