Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Datanode replication issue

avatar
Explorer

93033-datanode.png

We have uploaded 9 gb of data in HDFS and we have configured 3 nodes and nodes block size as default is 128MB,as we know hadoop replicate data in 3 nodes now in this case if we have uploaded 9GB of data it should comsume 9 X 3 GB = 27GB

however what we can see in the below attached screenshot that it is taking 27GB in each datanode. Can someone please help to understand what went wrong.

1 ACCEPTED SOLUTION

avatar
Contributor

I believe there are other files along with your 9GB file and by coincidence the other files constitute 18GB of data.

Other files constitute of many component libraries, ambari data, user data, tmp data.

Run the below command to find which files are taking size :

hadoop fs -du -s -h /*


Go further by putting path in place of * till you find which other files are present which add upto 18G

View solution in original post

3 REPLIES 3

avatar
Contributor

I believe there are other files along with your 9GB file and by coincidence the other files constitute 18GB of data.

Other files constitute of many component libraries, ambari data, user data, tmp data.

Run the below command to find which files are taking size :

hadoop fs -du -s -h /*


Go further by putting path in place of * till you find which other files are present which add upto 18G

avatar
Explorer

@Soumitra SulavThank you for your help.. and yes sorry other files like libraries, ambari data, user data, tmp data will be taking so much off data.

actually there were some preproccesed data stored for which we were unaware sorry and still thank you

avatar
Contributor

If it worked for you, please take a moment to login and "Accept" the answer.