About snichols

snichols · ‎12-11-2015

Typical usage of HDFS is large blocks that are access sequentially. In this scenario seek time has negligible cost and throughput is the only significant factor that determines speed. Hard drives typically have high sequential transfer rates so this is an ideal situation. Other files that depend on sequential access are swap files and some temp files if they are large and being produced all at once. Log files also work well here. SSD drives make an excellent choice for a relational database because their access pattern is one of random reads and writes of small blocks. When reading and writing small blocks in random order seek time is the major cost while throughput is relatively insignificant. Since SSDs have zero seek time they are perfect for relational databases despite (traditionally) having lower throughput. They are also good for large collections of small files such as the operating system binaries, config files, and collections of temporary files. Different cloud storage providers aggregate drive throughput differently and therefore provide different performance guarantees so you will need to read the fine print to determine what metrics are actually specified. SSDs scale up faster than spindles because they are relatively small compared to hard drives. For the same amount of space there will be more SSDs striped together which means the throughput can be higher than a hard drive of the same size. tl;dr SSDis only faster for typical HDFS usage if the storage provider offers higher throughput than HDD.

Online	Offline
Last Visited	‎12-14-2015 06:06 PM

Member Since	‎12-09-2015 04:40 PM
Last Visited	‎12-14-2015 06:06 PM
Posts	1
Kudos received	1

Cloudera Community

Re: Do people see benefit from SSDs and HDFS? Can ...