Member since
06-24-2014
45
Posts
9
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
965 | 06-27-2016 08:57 PM |
01-23-2020
06:23 AM
@alexmc As this thread is a couple of years old, you would be better served by creating a new thread.
... View more
05-10-2017
01:24 PM
Thanks! I would be interested to learn more when you are ready to announce it.
... View more
09-23-2016
09:17 PM
2 Kudos
We just did a refresh of the Sandbox to fix this issue. If you have run into this issue I would suggest downloading the Sandbox again or following the steps above.
... View more
07-10-2017
05:11 PM
Is this article still valid for HDF version 3.0 which was released recently? Are there easier ways of deploying to Amazon?
... View more
07-06-2016
11:55 AM
Thanks, yes there was a duplicate column (the key in both tables) and I did not realise that was preventing me from using CTAS.
... View more
06-27-2016
08:57 PM
OK, sorry I solved this. I discovered that Sqoop client was on those two machines - and had not been restarted. I may have forgotten it when doing it manually.
... View more
01-07-2016
02:27 PM
Thanks! Sounds like my sandbox is out of date now 🙂
... View more
12-11-2015
06:53 PM
1 Kudo
Typical usage of HDFS is large blocks that are access sequentially. In this scenario seek time has negligible cost and throughput is the only significant factor that determines speed. Hard drives typically have high sequential transfer rates so this is an ideal situation. Other files that depend on sequential access are swap files and some temp files if they are large and being produced all at once. Log files also work well here. SSD drives make an excellent choice for a relational database because their access pattern is one of random reads and writes of small blocks. When reading and writing small blocks in random order seek time is the major cost while throughput is relatively insignificant. Since SSDs have zero seek time they are perfect for relational databases despite (traditionally) having lower throughput. They are also good for large collections of small files such as the operating system binaries, config files, and collections of temporary files. Different cloud storage providers aggregate drive throughput differently and therefore provide different performance guarantees so you will need to read the fine print to determine what metrics are actually specified. SSDs scale up faster than spindles because they are relatively small compared to hard drives. For the same amount of space there will be more SSDs striped together which means the throughput can be higher than a hard drive of the same size. tl;dr SSDis only faster for typical HDFS usage if the storage provider offers higher throughput than HDD.
... View more