Support Questions

a_sam · ‎10-29-2018

We have uploaded 9 gb of data in HDFS and we have configured 3 nodes and nodes block size as default is 128MB,as we know hadoop replicate data in 3 nodes now in this case if we have uploaded 9GB of data it should comsume 9 X 3 GB = 27GB

however what we can see in the below attached screenshot that it is taking 27GB in each datanode. Can someone please help to understand what went wrong.

ssulav · ‎10-29-2018

I believe there are other files along with your 9GB file and by coincidence the other files constitute 18GB of data.

Other files constitute of many component libraries, ambari data, user data, tmp data.

Run the below command to find which files are taking size :

hadoop fs -du -s -h /*

Go further by putting path in place of * till you find which other files are present which add upto 18G

View solution in original post

ssulav · ‎10-29-2018

I believe there are other files along with your 9GB file and by coincidence the other files constitute 18GB of data.

Other files constitute of many component libraries, ambari data, user data, tmp data.

Run the below command to find which files are taking size :

hadoop fs -du -s -h /*

Go further by putting path in place of * till you find which other files are present which add upto 18G

a_sam · ‎10-29-2018

@Soumitra SulavThank you for your help.. and yes sorry other files like libraries, ambari data, user data, tmp data will be taking so much off data.

actually there were some preproccesed data stored for which we were unaware sorry and still thank you

ssulav · ‎10-29-2018

If it worked for you, please take a moment to login and "Accept" the answer.

Cloudera Community

Support Questions

Datanode replication issue