<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: GPU scheduling with YARN services in HDP 3.0 in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190833#M83607</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/92239/amila-manoj-silvawathudura-gardi-hewa.html" nodeid="92239"&gt;@Amila
 Silva
&lt;/A&gt;&lt;P&gt;HDP 3.0 supports GPU isolation in docker using nvidia-docker-plugin &lt;A href="https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin" target="_blank"&gt;https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin&lt;/A&gt; which is part of nvidia-docker v1. Currently only this is supported and not the newer version. &lt;/P&gt;</description>
    <pubDate>Tue, 18 Sep 2018 13:44:29 GMT</pubDate>
    <dc:creator>TarunParimi</dc:creator>
    <dc:date>2018-09-18T13:44:29Z</dc:date>
    <item>
      <title>GPU scheduling with YARN services in HDP 3.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190832#M83606</link>
      <description>&lt;P&gt;
	I'm trying to get GPU scheduling working with a YARN service app with docker. I have nvidia-docker v2 installed.&lt;/P&gt;&lt;P&gt;
	In the service description, I've configured resources as follows:&lt;/P&gt;&lt;P&gt;
	      "resource": {
        "cpus": 2,
        "memory": "4096",
        "additional": {
          "yarn.io/gpu": {
            "value": 1
          }
        }
      }&lt;/P&gt;&lt;P&gt;
	The app fails with following exception in node manager which indicates nvidia-docker-v1 REST API is required:&lt;/P&gt;&lt;PRE&gt;2018-09-17 14:19:43,490 WARN  gpu.NvidiaDockerV1CommandPlugin (NvidiaDockerV1CommandPlugin.java:init(145)) - IOException of NvidiaDockerV1CommandPlugin init:
java.net.ConnectException: Connection refused (Connection refused)&lt;/PRE&gt;&lt;P&gt;
	What is the recommended way of getting GPU scheduling working in HDP 3.0?&lt;/P&gt;&lt;P&gt;
	Do I have to downgrade to deprecated nvidia-docker v1? Or is there any other workaround?&lt;/P&gt;</description>
      <pubDate>Tue, 18 Sep 2018 10:00:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190832#M83606</guid>
      <dc:creator>Amila-Manoj-Sil</dc:creator>
      <dc:date>2018-09-18T10:00:20Z</dc:date>
    </item>
    <item>
      <title>Re: GPU scheduling with YARN services in HDP 3.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190833#M83607</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/92239/amila-manoj-silvawathudura-gardi-hewa.html" nodeid="92239"&gt;@Amila
 Silva
&lt;/A&gt;&lt;P&gt;HDP 3.0 supports GPU isolation in docker using nvidia-docker-plugin &lt;A href="https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin" target="_blank"&gt;https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin&lt;/A&gt; which is part of nvidia-docker v1. Currently only this is supported and not the newer version. &lt;/P&gt;</description>
      <pubDate>Tue, 18 Sep 2018 13:44:29 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190833#M83607</guid>
      <dc:creator>TarunParimi</dc:creator>
      <dc:date>2018-09-18T13:44:29Z</dc:date>
    </item>
    <item>
      <title>Re: GPU scheduling with YARN services in HDP 3.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190834#M83608</link>
      <description>&lt;P&gt;We only support nvidia-docker v1. We're looking at support of v2, not decided plans yet. v1 works nicely according to our current tests.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Sep 2018 22:42:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190834#M83608</guid>
      <dc:creator>wtan</dc:creator>
      <dc:date>2018-09-18T22:42:18Z</dc:date>
    </item>
    <item>
      <title>Re: GPU scheduling with YARN services in HDP 3.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190835#M83609</link>
      <description>&lt;P&gt;We only support nvidia-docker v1. We're looking at support of v2, not decided plans yet. v1 works nicely according to our current tests.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Sep 2018 22:42:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190835#M83609</guid>
      <dc:creator>wtan</dc:creator>
      <dc:date>2018-09-18T22:42:26Z</dc:date>
    </item>
    <item>
      <title>Re: GPU scheduling with YARN services in HDP 3.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190836#M83610</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/534/wtan.html" nodeid="534"&gt;@wtan&lt;/A&gt;, &lt;A rel="user" href="https://community.cloudera.com/users/62182/tparimi.html" nodeid="62182"&gt;@Tarun Parimi&lt;/A&gt; I've downgraded to nvidia-docker v1. REST API is also working. When I do curl localhost:3476/v1.0/docker/cli, I get:&lt;/P&gt;&lt;PRE&gt;--volume-driver=nvidia-docker --volume=nvidia_driver_396.44:/usr/local/nvidia:ro --device=/dev/nvidiactl --device=/dev/nvidia-uvm --device=/dev/nvidia-uvm-tools --device=/dev/nvidia0&lt;/PRE&gt;&lt;P&gt;Now when I try to run the YARN app, it fails with following exception:&lt;/P&gt;&lt;PRE&gt;java.io.IOException: Unable to prepare container:
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.prepareContainer(LinuxContainerExecutor.java:472)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.prepareContainer(ContainerLaunch.java:368)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:289)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:103)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException: ExitCodeException exitCode=1: /usr/bin/nvidia-docker | 2018/09/25 15:45:56 Error: failed to run docker command

        at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.runDockerVolumeCommand(DockerLinuxContainerRuntime.java:404)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.prepareContainer(DockerLinuxContainerRuntime.java:426)
&lt;/PRE&gt;&lt;P&gt;I think this happens because the docker volume it's trying to create is already existing. When I run docker volume ls I get:&lt;/P&gt;&lt;PRE&gt;DRIVER              VOLUME NAME
nvidia-docker       nvidia_driver_396.44
&lt;/PRE&gt;&lt;P&gt;Why is YARN creating this volume? Isn't it supposed to be handled by nvidia-docker?&lt;/P&gt;&lt;P&gt;Should I manually delete this existing volume? If so, will the volume automatically be deleted when the app is completed?&lt;/P&gt;</description>
      <pubDate>Wed, 26 Sep 2018 00:22:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190836#M83610</guid>
      <dc:creator>Amila-Manoj-Sil</dc:creator>
      <dc:date>2018-09-26T00:22:55Z</dc:date>
    </item>
    <item>
      <title>Re: GPU scheduling with YARN services in HDP 3.0</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190837#M83611</link>
      <description>&lt;P&gt;In case this helps someone, the issue was I had configured nvidia-docker as the docker binary. It needs to point to original docker binary.&lt;/P&gt;</description>
      <pubDate>Mon, 01 Oct 2018 22:19:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/GPU-scheduling-with-YARN-services-in-HDP-3-0/m-p/190837#M83611</guid>
      <dc:creator>Amila-Manoj-Sil</dc:creator>
      <dc:date>2018-10-01T22:19:26Z</dc:date>
    </item>
  </channel>
</rss>

