Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Expert Contributor

1. WASB --> https://blogs.msdn.microsoft.com/cindygross/2015/02/04/understanding-wasb-and-hadoop-storage-in-azur...

WASB is a storage model which allows storage of data in Blobs within storage accounts/ containers in Azure cloud.

2. DASH --> http://sequenceiq.com/cloudbreak-deployer/latest/azure_pre_prov/ --> This link describes a few scale-out related limits of WASB, and proposes DASH as the solution. DASH is not supported as a storage option, and there are scalability limitations on the number of storage accounts.

To quote,

"When WASB is used as a Hadoop filesystem the files are full-value blobs in a storage account. It means better performance compared to the data disks and the WASB filesystem can be configured very easily but Azure storage accounts have their own limitations as well. There is a space limitation for TB per storage account (500 TB) as well but the real bottleneck is the total request rate that is only 20000 IOPS where Azure will start to throw errors when trying to do an I/O operation. To bypass those limits Microsoft created a small service called DASH. DASH itself is a service that imitates the API of the Azure Blob Storage API and it can be deployed as a Microsoft Azure Cloud Service. Because its API is the same as the standard blob storage API it can be used almost in the same way as the default WASB filesystem from a Hadoop deployment. DASH works by sharding the storage access across multiple storage accounts. It can be configured to distribute storage account load to at most 15 scaleout storage accounts. It needs one more namespace storage account where it keeps track of where the data is stored. When configuring a WASB filesystem with Hadoop, the only required config entries are the ones where the access details are described. To access a storage account Azure generates an access key that is displayed on the Azure portal or can be queried through the API while the account name is the name of the storage account itself. A DASH service has a similar account name and key, those can be configured in the configuration file while deploying the cloud service."

3. Cloudbreak's allocation model using Multiple storage accounts and Local HDFS. (High performance / Scale out option ) .

When allocating HDFS on Azure, Cloudbreak can leverage multiple Storage Accounts and spread data across several storage accounts, This allows data to be sharded across various Storage accounts and helps overcome storage account level limitations on IOPS. This option can be used to scale upto 200 Storage accounts, where as DASH is limited to 15 Scale out Storage accounts.The disk selection can support both premium and storage , based on the VM Type. DS13 or Ds14 VM's are economical for most general purpose use cases, and can support 16 , 1TB storage disks (Standard)

storageaccounts.png

916 Views