Hi dear experts!
i met interesting case with my cluster.
3 node, 12 disks for Hadoop on each of them
i have big hive table (uncompressed csv):
900GB, 3600 blocks, 256MB - block size
I ran hdfs check command:
hdfs fsck /path/to/table
and check number of distinct values of StorageID (disks) and got 19.
i.e. i store that data only on 19 of 36 devices.
does enyone know how to fix this and how to avoid this in future?
i'm using latest one CDH5.7.
btw, do you know how to map StorageID to the real physical device (like /dev/sda)?
thank you in advance!