Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Uneven block distribution across cluster disks

Highlighted

Uneven block distribution across cluster disks

Rising Star

Hi dear experts!

 

i met interesting case with my cluster.

3 node, 12 disks for Hadoop on each of them

i have big hive table (uncompressed csv):

900GB, 3600 blocks, 256MB - block size

I ran hdfs check command:

hdfs fsck /path/to/table

and check number of distinct values of StorageID (disks) and got 19.

i.e. i store that data only on 19 of 36 devices.

does enyone know how to fix this and how to avoid this in future?

 

thanks!

 

3 REPLIES 3
Highlighted

Re: Uneven block distribution across cluster disks

Master Guru
What version of CDH are you using? Its probable you are likely missing some
bug fixes in recent CDH releases w.r.t. StorageUUID uniqueness that's
affecting you.

Re: Uneven block distribution across cluster disks

Rising Star

i'm using latest one CDH5.7.

btw, do you know how to map StorageID to the real physical device (like /dev/sda)?

 

thank you in advance!

Highlighted

Re: Uneven block distribution across cluster disks

Master Guru
Did you upgrade your CDH from an earlier version, or is this a fresh install?

Also, to double confirm, could you share your VERSION file contents that illustrates the issue of StorageID?

The StorageID just has to be something unique per disk, not necessarily a specific string/identifier that maps to physical IDs. When left to HDFS, it uses generated UUIDs to denote them.
Don't have an account?
Coming from Hortonworks? Activate your account here