Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Should I use RDMA in Microsoft Azure?

avatar
Guru

According to https://azure.microsoft.com/en-gb/documentation/articles/virtual-machines-a8-a9-a10-a11-specs/ The A8-9 instances support an RDMA 32MBs backplane for node to node communication on SLES.

Is the SLES image the preferred / only image which support this networking layer, are there RedHat flavour alternatives.

Would access to the 32MBs backplane through a multi-home topology make a significant difference to intra-cluster communication vs relatively small CPU scale in A8-9?

Simon

1 ACCEPTED SOLUTION

avatar
Guru

There may be some marginal gain in terms of network backplane throughput, however, it's not really necessary, and balanced against cost, availability and flexibility. The A8-11 instances are more intended for traditional HPC which require non-commodity networking. They are relatively rare compared to the more commodity backed instances in Azure, so can be hard to provision in some regions in large volume. The other key consideration is that they are not portable to other instance classes, so some of the elasticity benefits are lost.

In short, you could in theory need the RDMA networking for very heavy shuffle ML (maybe for deep learning or some of the newer neural net and graph algorithms in spark) but the cost doesn't usually justify this, and you're usually going to be better off with D class instances for YARN and HDFS.

View solution in original post

3 REPLIES 3

avatar

@Simon Elliston Ball did you ever get an answer to this?

avatar
Guru

Sort of... answering my own question below...

avatar
Guru

There may be some marginal gain in terms of network backplane throughput, however, it's not really necessary, and balanced against cost, availability and flexibility. The A8-11 instances are more intended for traditional HPC which require non-commodity networking. They are relatively rare compared to the more commodity backed instances in Azure, so can be hard to provision in some regions in large volume. The other key consideration is that they are not portable to other instance classes, so some of the elasticity benefits are lost.

In short, you could in theory need the RDMA networking for very heavy shuffle ML (maybe for deep learning or some of the newer neural net and graph algorithms in spark) but the cost doesn't usually justify this, and you're usually going to be better off with D class instances for YARN and HDFS.