Support Questions

Jason.Chen · ‎12-22-2014

Hi,

I am trying to run Oryx on a machine that is not part of the cluster...

My setting for the oryx.conf is as below (about the Hadoop/HDFS settings)... Is that a right setting ?

Is there something else I need to set for the oryx.conf

model=${als-model}

model.instance-dir=hdfs://name_node:8020/oryx_data

model.local-computation=false

model.local-data=false

Thanks.

srowen · ‎12-23-2014

That's fine. The machine needs to be able to communicate with the cluster of course. Usually you would make the Hadoop configuration visible as well and point to it with HADOOP_CONF_DIR. I think that will be required to get MapReduce to work.

View solution in original post

srowen · ‎08-23-2015

Right, I forgot to mention that part: you need the cluster's binaries too, like ZK, HDFS, YARN, Spark, etc. It is using the cluster's distribution.

As you can see, it's definitely intended to be run on a cluster edge node, so I'd strongly suggest running it that way.

Hi, JasonChen. · ‎08-23-2015

Hi, JasonChen.

You have to copy /opt/cloudera/CDH/jars , /etc/hadoop from a node of cluster to your machine runing oryx2.

I had tried a few ways to run it outside the cluster, but all failed.

The node running oryx2 had to be runed inside cluster.

My conclusion is that , CDH maybe requrie the same parcels version and cloudera agent on node to use the cluster resources.

srowen · ‎08-23-2015

There shouldn't be any other dependencies. If the error is like what you showed before, it's just firewall/port config problems.

horatio · ‎08-24-2015

Actually, I don't know the exact reasons and had stuck in this problem for a few day with firewalls on all machines disabled at very first.
I used to deploy hadoop, spark and so on by extracting source tarballs. Forturnately, edge node seems to be a good idea to acess cluster resources.

Cloudera Community

Support Questions

Run Oryx on a machine that is not part of the cluster