01-06-2016 07:17 AM
01-07-2016 08:13 AM
Hi thanks for the quick response.
To answer some of your questions above: I do have a speed layer running. I can see activity on the input topic immediately after I ingest through the serving layer, and I also see activity on the update topic roughly 5 mins later.
I am inclined to believe that the batch layer is successfully building the model, as I don't see any error in the output logs and the files are available on HDFS when processing completes.
Unfortunately, I don't see any activity on the driver UI on port 4041. No jobs seem to be added.
I did discover the following errors today which maybe you could shed some light on.
When I start the serving layer, I see the following message in the Kafka logs:
[2016-01-07 16:00:38,391] ERROR Closing socket for /127.0.0.1 because of error (kafka.network.Processor)
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
I also see the following on my hadoop cluster jobs. The SpeedLayer job seems to hang in the ACCEPTED state.
|application_1452182244662_0002||dylana||OryxSpeedLayer-ALSExample||SPARK||default||Thu, 07 Jan 2016 15:59:13 GMT||N/A||ACCEPTED||UNDEFINED||UNASSIGNED|
|application_1452182244662_0001||dylana||OryxBatchLayer-ALSExample||SPARK||default||Thu, 07 Jan 2016 15:58:13 GMT||N/A||RUNNING||UNDEFINED||ApplicationMaster|
01-07-2016 09:11 AM
Also, I did see an error in the SpeedLayer Yarn jobs similar to the one below:
INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable. ERROR yarn.ApplicationMaster: Failed to connect to driver at xxxx
01-11-2016 02:01 AM
Yea I think that was part of the issue. I've moved to running on a new machine with more resources, and I can now see the results being offered by the serving layer.
However, I still don't see any action on the speed layer. When I ingest some data it always takes 5 mins before any action is taken (which is on the batch layer). I also still don't see any jobs at 4041, and the job is still marked as "ACCEPTED" on the cluster. There is definitely still about 2GB RAM still available on the machine so I think it must be something else.
01-11-2016 02:07 AM
02-25-2016 08:59 AM
I recently just came back to this. I believe the issue is with a lack of resources available. I can run the batch layer and get results through the serving layer. If I shut then shut down the batch layer and start the speed layer, I can feed more data and get further results through the serving layer. Ideally I will be moving to a machine with more resources soon, so I am happy that I can get Oryx running for now. Thanks for help.
I have another question regarding a custom application. I am looking at the example app on Github (https://github.com/OryxProject/oryx/tree/master/ap
The type ExampleSpeedModelManager must implement the inherited abstract method SpeedModelManager<String,String,String>.close()
03-02-2016 02:46 AM
srowen, I'm looking into using/customising the ALS recommendation example. Is there a link that describes the system (inputs/outputs, data flow etc) or could you give a brief explanation of the system here? Thanks