Support Questions

Find answers, ask questions, and share your expertise

Cloudera GPU Support

New Contributor

Hi All,

 

I'm curious - does anyone here have any experience with GPU accelerated Cloudera or even if its supported?  I've been reading around and noticed a number of teams improving CPU bound jobs by utilising Nvidea's CUDA and offloading the calculations to the GPU when using Hadoop.  I'm looking for people to just share their $0.02 on "yes, this worked for us and here is how to go about it" or "no this is a no go".

 

Any thoughts or ideas?

 

Thanks,

Chris.

4 REPLIES 4

Contributor

 

Hey Chris, 

 

Officially CDH does not support GPU offloading, however there are some JIRA's that have been created to explore/brainstorm these possabilities. I have included them below:

 

https://issues.apache.org/jira/browse/SPARK-3785

https://issues.apache.org/jira/browse/SPARK-12620

 

I would also keep an eye on our Engineering Blog to see if there have been any new update on new use cases regarding this.

 

 http://blog.cloudera.com/

 

Thanks, 

Jordan 

Explorer

Hi Jordan,

 

You mentioned that officially CDH does not support GPU offloading, but does CDH support installation on GPU nodes? 

 

Regards,

kcyea

New Contributor

This is originally an old post, and replies are also old and inconclusive. I hope I may resurrect this thread. Subsequent to the date of this original posting, it appears CDSW will support GPU when the environment is contoured via this guidance:
https://www.cloudera.com/documentation/data-science-workbench/latest/topics/cdsw_gpu.html

We are selecting a configuration for at least one server that will hold GPU card(s), and will be a Worker Node within our CDSW deployment. We typically use Dell servers and are looking at an R940-series server since it can hold 4GPUs locally + lots of RAM & CPU.

Regarding specific GPU compatibility constraints, looking at the nVidia page here:
http://us.download.nvidia.com/XFree86/Linux-x86_64/390.25/README/supportedchips.html

It looks like Tesla cards are compatible, and are also selectable on a Dell R940 server configuration, however there are many other nVidia cards that are also compatible with the CDSW-compatible driver, but are not selectable on a Dell R940 server configuration. Non-Tesla cards are a fraction of the cost of the Tesla versions. It seems the main difference between "cheap" and "expensive" nVidia cards is "GPU Passthru" necessary for supporting virtualized environments, grid computing, etc. I assume this is why they are supported by default within large commodity enterprise servers.

So, CDSW executes processes within a Docker, and this is a form of virtualization whose abstraction details I am not fully versed in; the resulting question is, "does a docker image qualify as a virtualized environment requiring GPU passthru?", or from CDSW's perspective, since an nVidia "Titan V" is compatible with the driver specified in the Cloudera document (link above), then is "GPU Passthru" not required, and CDSW would support a cheaper "Titan V" card, as long as it would work on the selected server?

If so, then it would drive potentially another discussion on if Dell can support a non-Tesla nVidia card on an R940 server (which is obviously a question for Dell).
To the community: Does this seem like a train of thought that anyone out here has pursued before, and can you share any insight? All will be appreciated.

New Contributor

I think i came across https://rapids.ai a spark framework, maybe we have something that can run on GPU's as i believe GPU are best for big data, had no idea earlier but the performance in some cases is 5x times faster so is worth to run bigdata jobs on GPU, though it needs some setup.