02-16-2017 07:27 AM
We have the CDH 5.8.2 (in 3 node config) installed within a docker which is running on Ubuntu OS. I would like to install RHadoop to work on the datasets present in the Hadoop env (within the Docker).
The problem I am facing here is that even though I am able to install R on my Ubuntu OS it is working like a regular R (non-RHadoop) since it is not able to access Hadoop (because CDH is installed within the docker).
02-16-2017 07:29 AM
If you mean the cluster is unreachable from the R process, then, yes, that's the problem. Of course, you need connectivity. You can run R on an edge node of the cluster, or any node that can access it, though typically you want them to be close because they'll be transferring more than a little data.
Docker and R are something outside of CDH's scope, but, there's not anything particular to know. Just install your tools on a machine that can see the cluster, and set up tools like RHadoop so that they know where the cluster config is, and it all just works.