Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Running TensorFlow in distributed mode with GPUs on a Hadoop cluster


What options does one have to run TensorFlow in distributed mode using GPUs under YARN on a Hadoop cluster while leveraging GPUs ?

For exploiting GPUs, one needs the CUDA and CUDNN libraries installed. For running TF in distributed mode, I understand that there is a way to do this with TF, but I don't believe there is a way of doing this directly under YARN today. Is this correct ?

One option to run TF in distributed mode is to run it on Spark under YARN, but there seems to be multiple ways to achieve this integration:

- TensorFrames: Experimental on Spark 2.1.

- TensorFlowOnSpark by Yahoo

- DeepLearning4J by Skymind


What's the proven approach?


Hi @zhoussen, please see Hortonworks's recent blog on TF assemblies running on YARN.

As discussed in the blog, this relies on some YARN JIRAs that are targeted for HDP 3.0.

  1. YARN-3611, Support Docker Containers In LinuxContainerExecutor: Better support of Docker container execution in YARN
  2. YARN-4793, Simplified API layer for services and beyond: The new simple-services API layer backed by REST interfaces. This API can be used to create and manage the lifecycle of YARN services in a simple manner. Services here can range from simple single-component apps to complex multi-component assemblies needing orchestration.


Hi @slachterman,

Thanks. Yes, I'm aware of the Data Lake 3.0 roadmap. But, the solution is based on Docker container support. This implies a radically different way of managing a bare-metal Hadoop cluster. I'm looking at ways of achieving this on a Hadoop 2.x based cluster today.

Rising Star

Hi @zhoussen

I am also trying to find an approach for tensorflow. Do you figure it out ?


Hi @nbalaji-elangovan,

I've just listed above the 3 ways I found discussed the most often, but I didn't get any definitive answer.

Super Guru

TensorFlow on Spark is run by Yahoo and working well. There is no official support for this yet

The DL4J guys are great and can run your Keras models. It is not TensorFlow, but you can get professional proven support from their team. Having talked with them many times, they are amazingly talented in Deep Learning and AI.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.