Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Using Yarn as resource manager for standalone Python and R code

SOLVED Go to solution

Using Yarn as resource manager for standalone Python and R code

New Contributor

Hello!

 

Is it possible to run "standalone" Python and R code using Yarn as a resource manager? "Standalone" means native Python or R code, not Spark jobs following by Python->PySpark->Spark or R->SparklyR->Spark execution. We want to use Yarn as a resource allocation service to run Python and R code in allocated by Yarn containers inside cluster node. Obviously, it's not distributed execution, still standalone, but worker node supposed to be allocated by Yarn. CDH 5.10.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Using Yarn as resource manager for standalone Python and R code

Master Guru
Yes. Use of YARN APIs will allow you to distribute and run any arbitrary command. Spark and MR2 are apps that leverage this to run Java commands with wrapper classes that drive their logic and flow, but there's nothing preventing you from writing your own.

Take a look at the Distributed Shell application implementation to understand the raw YARN APIs used to run arbitrary commands via YARN allocated resource containers: https://github.com/cloudera/hadoop-common/blob/cdh5.12.0-release/hadoop-yarn-project/hadoop-yarn/had...

If you're asking of an inbuilt way of running programs over YARN without any code, then aside of the DistributedShell there's no other included implementation. Even with the DistributedShell you may not really get the tight integration (such as result extraction, status viewing, etc.) you require.

There's likely a few more higher level frameworks that can make things easier when developing custom YARN apps, such as Spring (https://spring.io/guides/gs/yarn-basic/), Kitten (https://github.com/cloudera/kitten), Cask's CDAP (https://docs.cask.co/cdap/current/en/developers-manual/getting-started/index.html).
1 REPLY 1

Re: Using Yarn as resource manager for standalone Python and R code

Master Guru
Yes. Use of YARN APIs will allow you to distribute and run any arbitrary command. Spark and MR2 are apps that leverage this to run Java commands with wrapper classes that drive their logic and flow, but there's nothing preventing you from writing your own.

Take a look at the Distributed Shell application implementation to understand the raw YARN APIs used to run arbitrary commands via YARN allocated resource containers: https://github.com/cloudera/hadoop-common/blob/cdh5.12.0-release/hadoop-yarn-project/hadoop-yarn/had...

If you're asking of an inbuilt way of running programs over YARN without any code, then aside of the DistributedShell there's no other included implementation. Even with the DistributedShell you may not really get the tight integration (such as result extraction, status viewing, etc.) you require.

There's likely a few more higher level frameworks that can make things easier when developing custom YARN apps, such as Spring (https://spring.io/guides/gs/yarn-basic/), Kitten (https://github.com/cloudera/kitten), Cask's CDAP (https://docs.cask.co/cdap/current/en/developers-manual/getting-started/index.html).