Created on 12-22-2014 06:00 PM - edited 09-16-2022 08:39 AM
Hi,
I am trying to run Oryx on a machine that is not part of the cluster...
My setting for the oryx.conf is as below (about the Hadoop/HDFS settings)... Is that a right setting ?
Is there something else I need to set for the oryx.conf
Created 12-23-2014 12:47 AM
That's fine. The machine needs to be able to communicate with the cluster of course. Usually you would make the Hadoop configuration visible as well and point to it with HADOOP_CONF_DIR. I think that will be required to get MapReduce to work.
Created 06-29-2015 12:41 AM
It's pretty likely. It would not be in the logs but in the error shown on the attempt's (dead) container's info screen in the history server. At least, I saw the same thing exactly and this resolved it, and I can sort of see why this is now a problem in Java 7.
Created 07-01-2015 08:34 AM
Sean,
I applied your changes to our code base and still seeing the similar error (as below).
I checked the job by using the job tracking URL (e.g., http://server105:8088/proxy/application_1432750221048_0525/)
and actually there is no failed attempt.
/// Logs ////
Thu May 28 07:27:57 PDT 2015 INFO Running job "Oryx-/user/xyz/int/def-1-122-Y-RowStep: Avro(hdfs://server105:8020/u... ID=1 (1/1)"
Thu May 28 07:27:57 PDT 2015 INFO Job status available at: http://server105:8088/proxy/application_1432750221048_0525/
Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 0 time(s); maxRetries=3
Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 1 time(s); maxRetries=3
Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 2 time(s); maxRetries=3
...
Thu May 28 07:34:15 PDT 2015 INFO Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
Thu May 28 07:34:16 PDT 2015 INFO Finished Oryx-/user/xyz/int/def-1-122-Y-RowStep
Thu May 28 07:34:16 PDT 2015 INFO Completed RowStep in 379s
Created 07-01-2015 09:34 AM
Just to check, you have this commit right?
https://github.com/cloudera/oryx/commit/4b5e557a36f3d666bab0befc21b79efdf1fcd52d
The symptom here is that the App Master for the MR job dies straight away, and can't be contacted. The important thing is to know why. For example when I looked at the AM app screen (i.e. http://[host]:8088/cluster/app/application_1435553713675_0018) I saw something like ...
Created 07-01-2015 05:37 PM
Yes, I applied your commit...
I went to an example
http://[host]:8088/cluster/app/application_1435263631757_19721
But, I still not seeing the error.
As I mentioned, the job/task is not really got killed or stopped. It just dropped some retrying info (as below), but it continues
Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 0 time(s); maxRetries=3
Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 1 time(s); maxRetries=3
Created 07-02-2015 12:14 AM
Yes but the question is why. This is just a message from the driver program saying the master can't be found. The question is what happened to the Application Master. If you find it in YARN, can you see what happened to that container? it almost surely failed to start but why?
Created 07-05-2015 10:19 PM
Sean,
I am not sure why.
But, it seems relating to firewall.
Our Oryx server is running in a virtiual Lan to talk to another virtual Lan firewall-ed.
It looks the dynamic port is because of ephemeral port and a bug
https://issues.apache.org/jira/browse/MAPREDUCE-6338
Still digging this issue.
Created 07-06-2015 12:51 AM
Yes that could also be a cause. Is it possible to run the process inside the firewall? certainly the MapReduce jobs are intended to be managed by the Computation Layer from within the cluster.
Created 08-20-2015 07:54 PM
you had talk about many issues above, but I find it more related to oryx 1 and MR2.
I wonder whether it possible to run oryx2 outside a CDH cluster?
I deployed a hadoop2.6.0-CDH-5.4.4 cluster with zookeeper, kafka , spark on yarn and hdfs.
After I tried to run oryx2 on my laptop outside the cluster above(the same CDH version deployed but not running ),
batch layer didn't print out as expected:
2015-08-20 23:45:39,278 INFO BatchLayer:82 Creating message stream from topic
2015-08-20 23:45:39,531 INFO AbstractSparkLayer:224 Initial offsets: {[OryxInput,0]=21642186}
2015-08-20 23:45:39,610 INFO BatchLayer:117 Starting Spark Streaming
2015-08-20 23:45:39,677 INFO BatchLayer:124 Spark Streaming is running
and it printed out exception at last :
Exception in thread "main" java.net.ConnectException: Call From m4040/192.168.88.46 to 0.0.0.0:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
On batch and speed web page, it showed like this:
I guess my laptop could not communicate with kafka on cluster and this oryx job was rejected by yarn ?!
Created 08-21-2015 01:58 AM
You can run the binaries on any machine that can see the Hadoop configuration on the classpath, and which can access all of the services it needs to in the cluster. There are a number of services to talk to: HDFS, YARN, Kafka, Spark and the app's executors. So in general you'd have to have a lot of ports open, and at that point your machine is effectively a gateway node in the cluster. Certainly it's meant to be run within the cluster.
The serving layer only needs access to Kafka, and that's by design, so it might more easily run outside the cluster.
Created 08-22-2015 07:19 PM
Sean,
I tried to run Oryx in a node that in the same LAN as the Hadoop cluster.
We tested Oryx 1 fine without problems (we used to have firewall issue. After moving node to the same LAN as Hadoop cluster,
it runs fine)....
We just start to test Oryx 2, using the same network (that's, no firewall issues).
I do have the /etc/hafoop/config in the node I am running Oryx 2.
However; I got the following errors when starting Oryx 2 batch layer..
It looks it's looking for cloudera CDH jar files... Any thought? I need to copy the jar files over ?
errors:
ls: cannot access /opt/cloudera/parcels/CDH/jars/zookeeper-*.jar: No such file or directory
ls: cannot access /opt/cloudera/parcels/CDH/jars/spark-assembly-*.jar: No such file or directory
Thanks.
Jason