Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

disk usage not equally spread across the data nodes

disk usage not equally spread across the data nodes

New Contributor

Hello,

we have a cluster with 3 data nodes.

- we have first exported all our HBASE tables

- truncate '<TABLE_NAME>' // from Hbase shell

- Import back the data using

hbase org.apache.hadoop.hbase.mapreduce.Import -Dhbase.client.scanner.caching=100 -Dmapreduce.map.speculative=false -Dmapreduce.reduce.speculative=false -Dmapreduce.reduce.speculative=false '<TABLE_NAME>' 'file:///hadoop/<TABLE_NAME>'

- then set replication = 3 on all the HDFS files

hdfs dfs -setrep -w 3 /apps

I would have expected to see the disk usage (both from ambari UI and Hadoop UI) equal on all the data nodes.

This was not the case for quite few days.

Is this normal?

2 REPLIES 2
Highlighted

Re: disk usage not equally spread across the data nodes

Expert Contributor

Is it possible that a large portion of your data has the same or similar key, such as a timestamp causing hotspotting? Because you imported the table, all records will have a similar timestamp. Take a look at the records to see.

Re: disk usage not equally spread across the data nodes

New Contributor

Thanks for your reply @bhagan

Can you please elaborate how this eventual "hotspotting" does affect the fact that the dick usage is not equally distributed across the data nodes?

Thanks in advance

Don't have an account?
Coming from Hortonworks? Activate your account here