- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Using Yarn as resource manager for standalone Python and R code
- Labels:
-
Apache YARN
Created on ‎08-22-2017 06:07 AM - edited ‎09-16-2022 05:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
Is it possible to run "standalone" Python and R code using Yarn as a resource manager? "Standalone" means native Python or R code, not Spark jobs following by Python->PySpark->Spark or R->SparklyR->Spark execution. We want to use Yarn as a resource allocation service to run Python and R code in allocated by Yarn containers inside cluster node. Obviously, it's not distributed execution, still standalone, but worker node supposed to be allocated by Yarn. CDH 5.10.
Thanks!
Created ‎09-05-2017 11:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Take a look at the Distributed Shell application implementation to understand the raw YARN APIs used to run arbitrary commands via YARN allocated resource containers: https://github.com/cloudera/hadoop-common/blob/cdh5.12.0-release/hadoop-yarn-project/hadoop-yarn/had...
If you're asking of an inbuilt way of running programs over YARN without any code, then aside of the DistributedShell there's no other included implementation. Even with the DistributedShell you may not really get the tight integration (such as result extraction, status viewing, etc.) you require.
There's likely a few more higher level frameworks that can make things easier when developing custom YARN apps, such as Spring (https://spring.io/guides/gs/yarn-basic/), Kitten (https://github.com/cloudera/kitten), Cask's CDAP (https://docs.cask.co/cdap/current/en/developers-manual/getting-started/index.html).
Created ‎09-05-2017 11:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Take a look at the Distributed Shell application implementation to understand the raw YARN APIs used to run arbitrary commands via YARN allocated resource containers: https://github.com/cloudera/hadoop-common/blob/cdh5.12.0-release/hadoop-yarn-project/hadoop-yarn/had...
If you're asking of an inbuilt way of running programs over YARN without any code, then aside of the DistributedShell there's no other included implementation. Even with the DistributedShell you may not really get the tight integration (such as result extraction, status viewing, etc.) you require.
There's likely a few more higher level frameworks that can make things easier when developing custom YARN apps, such as Spring (https://spring.io/guides/gs/yarn-basic/), Kitten (https://github.com/cloudera/kitten), Cask's CDAP (https://docs.cask.co/cdap/current/en/developers-manual/getting-started/index.html).
