Install R in CDC 5.8 within Docker

We have the CDH 5.8.2 (in 3 node config) installed within a docker which is running on Ubuntu OS. I would like to install RHadoop to work on the datasets present in the Hadoop env (within the Docker).


The problem I am facing here is that even though I am able to install R on my Ubuntu OS it is working like a regular R (non-RHadoop) since it is not able to access Hadoop (because CDH is installed within the docker).


  1. Can you please suggest the way to mitigate this issue - i.e. how to make the R (which is installed on the Ubuntu OS outside of the Docker) to interact with the CDH 5.8.2 (which is installed within the docker)?
  2. If I need to install R and its Hadoop packages within the Docker, then what are the steps to do so?




If you mean the cluster is unreachable from the R process, then, yes, that's the problem. Of course, you need connectivity. You can run R on an edge node of the cluster, or any node that can access it, though typically you want them to be close because they'll be transferring more than a little data.

Docker and R are something outside of CDH's scope, but, there's not anything particular to know. Just install your tools on a machine that can see the cluster, and set up tools like RHadoop so that they know where the cluster config is, and it all just works.