Reply
New Contributor
Posts: 1
Registered: ‎09-19-2016
Accepted Solution

Using Yarn as resource manager for standalone Python and R code

Hello!

 

Is it possible to run "standalone" Python and R code using Yarn as a resource manager? "Standalone" means native Python or R code, not Spark jobs following by Python->PySpark->Spark or R->SparklyR->Spark execution. We want to use Yarn as a resource allocation service to run Python and R code in allocated by Yarn containers inside cluster node. Obviously, it's not distributed execution, still standalone, but worker node supposed to be allocated by Yarn. CDH 5.10.

 

Thanks!

Highlighted
Posts: 1,525
Kudos: 266
Solutions: 232
Registered: ‎07-31-2013

Re: Using Yarn as resource manager for standalone Python and R code

Yes. Use of YARN APIs will allow you to distribute and run any arbitrary command. Spark and MR2 are apps that leverage this to run Java commands with wrapper classes that drive their logic and flow, but there's nothing preventing you from writing your own.

Take a look at the Distributed Shell application implementation to understand the raw YARN APIs used to run arbitrary commands via YARN allocated resource containers: https://github.com/cloudera/hadoop-common/blob/cdh5.12.0-release/hadoop-yarn-project/hadoop-yarn/had...

If you're asking of an inbuilt way of running programs over YARN without any code, then aside of the DistributedShell there's no other included implementation. Even with the DistributedShell you may not really get the tight integration (such as result extraction, status viewing, etc.) you require.

There's likely a few more higher level frameworks that can make things easier when developing custom YARN apps, such as Spring (https://spring.io/guides/gs/yarn-basic/), Kitten (https://github.com/cloudera/kitten), Cask's CDAP (https://docs.cask.co/cdap/current/en/developers-manual/getting-started/index.html).
Backline Customer Operations Engineer
Announcements