I am building a dashboard based on streaming data.
I use Cloudera 5.10, Spark 2.1 and Kafka 0.10.
I wanted to give little boost to my app and one of optimizations was to put checkpoint dir to SSD drives.
I have added fresh and empty nodes (4 of them) with volumes prefixed as SSD (256GB each). All assigned to the same rack.
I have set policy on hdfs:///spark/checkpoints/myapp to ALL_SSD
hdfs storagePolicies -setStragePolicy -path hdfs:///spark/checkpoints/myapp -policy ALL_SSD
I have changed replication factor for this dir from default 3 to 2 with command
hdfs dfs -setrep -w 2 hdfs:///spark/checkpoints/myapp
And after I star my app
hdfs fsck hdfs:///spark/checkpoints/myapp -files -blocks -locations
says that blocks has 3 replicas (2 on SSD and 1 on DISK).
2SSD and 1DISK instead expected 3SSD (if replication factor =3) - I suspect problem with racks, because according to replication algoritm (2copies in the same rack and 3rd to the node in the other rack), Fallback storages for replication setting (DISK) for ALL_SSD policy could have been used here.
I will try to reorganize nodes placement and spread them over 2 racks - does any one has some other ideas why I got 1 DISK?
Why I have 3 replicas despite I've set manually setrep -w 2 at the parent directory?