Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Running TensorFlow in distributed mode with GPUs on a Hadoop cluster

Highlighted

Running TensorFlow in distributed mode with GPUs on a Hadoop cluster

Contributor

What options does one have to run TensorFlow in distributed mode using GPUs under YARN on a Hadoop cluster while leveraging GPUs ?

For exploiting GPUs, one needs the CUDA and CUDNN libraries installed. For running TF in distributed mode, I understand that there is a way to do this with TF, but I don't believe there is a way of doing this directly under YARN today. Is this correct ?

One option to run TF in distributed mode is to run it on Spark under YARN, but there seems to be multiple ways to achieve this integration:

- TensorFrames: Experimental on Spark 2.1.

- TensorFlowOnSpark by Yahoo

- DeepLearning4J by Skymind

...

What's the proven approach?

5 REPLIES 5

Re: Running TensorFlow in distributed mode with GPUs on a Hadoop cluster

Hi @zhoussen, please see Hortonworks's recent blog on TF assemblies running on YARN.

As discussed in the blog, this relies on some YARN JIRAs that are targeted for HDP 3.0.

  1. YARN-3611, Support Docker Containers In LinuxContainerExecutor: Better support of Docker container execution in YARN
  2. YARN-4793, Simplified API layer for services and beyond: The new simple-services API layer backed by REST interfaces. This API can be used to create and manage the lifecycle of YARN services in a simple manner. Services here can range from simple single-component apps to complex multi-component assemblies needing orchestration.

Re: Running TensorFlow in distributed mode with GPUs on a Hadoop cluster

Contributor

Hi @slachterman,

Thanks. Yes, I'm aware of the Data Lake 3.0 roadmap. But, the solution is based on Docker container support. This implies a radically different way of managing a bare-metal Hadoop cluster. I'm looking at ways of achieving this on a Hadoop 2.x based cluster today.

Re: Running TensorFlow in distributed mode with GPUs on a Hadoop cluster

Rising Star

Hi @zhoussen

I am also trying to find an approach for tensorflow. Do you figure it out ?

Re: Running TensorFlow in distributed mode with GPUs on a Hadoop cluster

Contributor

Hi @nbalaji-elangovan,

I've just listed above the 3 ways I found discussed the most often, but I didn't get any definitive answer.

Re: Running TensorFlow in distributed mode with GPUs on a Hadoop cluster

Super Guru

TensorFlow on Spark is run by Yahoo and working well. There is no official support for this yet

The DL4J guys are great and can run your Keras models. It is not TensorFlow, but you can get professional proven support from their team. Having talked with them many times, they are amazingly talented in Deep Learning and AI.

Don't have an account?
Coming from Hortonworks? Activate your account here