Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3454 | 01-26-2018 04:02 AM | |
7090 | 12-22-2017 09:18 AM | |
3538 | 12-05-2017 06:13 AM | |
3857 | 10-16-2017 07:55 AM | |
11231 | 10-04-2017 08:08 PM |
09-02-2015
05:03 AM
That generally means it's still waiting for YARN to allocate an executor, and that in turn usually means you don't have enough resources free in YARN to satisfy the request. Check your number and size of executors vs available resources and max size of any one container that your YARN config allows.
... View more
08-26-2015
01:07 PM
This may be too unspecific to be helpful, but I recall several JIRAs fixed for Spark 1.4 that concern the .inprogress files and history server. I expect that whatever this is could be related. If so, then the fix would be coming in 5.5 at the latest.
... View more
08-26-2015
01:02 AM
Oops, thanks for catching that. Yes the serving layer needs to see HDFS to read big models. You can change a few kafka and oryx configs to allow very big models as kafka messages and thus bigger models if needed, though ideally the serving layer can just see HDFS. I had also envisioned that the serving layer is often run in or next to the cluster, and isn't publicly visible. It's a service to other front-end systems, or at least behind a load balancer. So exposing a machine with cluster access isn't so crazy as it need not be open to the world.
... View more
08-25-2015
02:09 AM
1 Kudo
Oh, this section configures the serving layer REST API -- what port it runs on, SSL cert, password, path, etc.
... View more
08-24-2015
11:28 PM
Oryx uses Spark Streaming, and Spark runs its executors on YARN. So YARN manages the resources used by the batch and speed layer. You can also use YARN to run the serving layer binaries via the oryx-run.sh script.
... View more
08-23-2015
09:50 PM
There shouldn't be any other dependencies. If the error is like what you showed before, it's just firewall/port config problems.
... View more
08-23-2015
12:26 PM
Yes, it uses Spark streaming for the batch and speed layers. Really big models are just 'passed' to the topic as an HDFS location. The max is configurable but is about 16MB. This tends to only matter for decision forests or ALS models with large numbers of users and items. The data in Kafka topics is replicated according to the topic config. Yes it can potentially be replicated across the machines that server as brokers.
... View more
08-23-2015
12:19 AM
1 Kudo
Right, I forgot to mention that part: you need the cluster's binaries too, like ZK, HDFS, YARN, Spark, etc. It is using the cluster's distribution. As you can see, it's definitely intended to be run on a cluster edge node, so I'd strongly suggest running it that way.
... View more
08-21-2015
01:58 AM
You can run the binaries on any machine that can see the Hadoop configuration on the classpath, and which can access all of the services it needs to in the cluster. There are a number of services to talk to: HDFS, YARN, Kafka, Spark and the app's executors. So in general you'd have to have a lot of ports open, and at that point your machine is effectively a gateway node in the cluster. Certainly it's meant to be run within the cluster. The serving layer only needs access to Kafka, and that's by design, so it might more easily run outside the cluster.
... View more