Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Run Oryx on a machine that is not part of the cluster

avatar
Explorer

Hi,

I am trying to run Oryx on a machine that is not part of the cluster...

 

 

My setting for the oryx.conf is as below (about the Hadoop/HDFS settings)... Is that a right setting ?

Is there something else I need to set for the oryx.conf

 

model=${als-model}
model.instance-dir=hdfs://name_node:8020/oryx_data
model.local-computation=false
model.local-data=false
 
 
 
Thanks.
 
 
 
1 ACCEPTED SOLUTION

avatar
Master Collaborator

That's fine. The machine needs to be able to communicate with the cluster of course. Usually you would make the Hadoop configuration visible as well and point to it with HADOOP_CONF_DIR. I think that will be required to get MapReduce to work.

View solution in original post

23 REPLIES 23

avatar
Master Collaborator

Right, I forgot to mention that part: you need the cluster's binaries too, like ZK, HDFS, YARN, Spark, etc. It is using the cluster's distribution. 

As you can see, it's definitely intended to be run on a cluster edge node, so I'd strongly suggest running it that way.

avatar
Explorer

Hi, JasonChen.

You have to copy /opt/cloudera/CDH/jars , /etc/hadoop from a node of cluster to your machine runing oryx2.

I had tried a few ways to run it outside the cluster, but all failed.

The node running oryx2 had to be runed inside cluster.

My conclusion is that , CDH maybe requrie the same parcels version and cloudera agent on node to use the cluster resources.

 

 

avatar
Master Collaborator

There shouldn't be any other dependencies. If the error is like what you showed before, it's just firewall/port config problems.

avatar
Explorer
Actually, I don't know the exact reasons and had stuck in this problem for a few day with firewalls on all machines disabled at very first.
I used to deploy hadoop, spark and so on by extracting source tarballs. Forturnately, edge node seems to be a good idea to acess cluster resources.