we have a cluster with 3 data nodes.
- we have first exported all our HBASE tables
- truncate '<TABLE_NAME>' // from Hbase shell
- Import back the data using
hbase org.apache.hadoop.hbase.mapreduce.Import -Dhbase.client.scanner.caching=100 -Dmapreduce.map.speculative=false -Dmapreduce.reduce.speculative=false -Dmapreduce.reduce.speculative=false '<TABLE_NAME>' 'file:///hadoop/<TABLE_NAME>'
- then set replication = 3 on all the HDFS files
hdfs dfs -setrep -w 3 /apps
I would have expected to see the disk usage (both from ambari UI and Hadoop UI) equal on all the data nodes.
This was not the case for quite few days.
Is this normal?
Is it possible that a large portion of your data has the same or similar key, such as a timestamp causing hotspotting? Because you imported the table, all records will have a similar timestamp. Take a look at the records to see.