Created on 07-23-2014 08:33 AM - edited 09-16-2022 02:03 AM
Hi, one of our developers needs to run pig scripts, access HDFS, run mapreduce on the CDH5 cluster from a remote machine. I'm a little confused how to accomplish this.
Do I need to add the remote machine to the cluster using the "Add Host" feature? I also read that I should make the remote machine a "Gateway", then download (deploy?) the client config files to that host. Am I on the right track? Thank you. -Mike
Created on 07-24-2014 12:06 PM - edited 07-24-2014 12:06 PM
Hi Mike,
yes I believe you are on the right track. you would need to add the remote host to the cluster, and then make it part of the gateway role group for hdfs and the gateway role group for mapreduce for that cluster.
You could create a new rolegroup if the remote host needs to have different configuration than the other gateway nodes in your cluster for whatever reason.
Created on 07-24-2014 12:06 PM - edited 07-24-2014 12:06 PM
Hi Mike,
yes I believe you are on the right track. you would need to add the remote host to the cluster, and then make it part of the gateway role group for hdfs and the gateway role group for mapreduce for that cluster.
You could create a new rolegroup if the remote host needs to have different configuration than the other gateway nodes in your cluster for whatever reason.