Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3445 | 01-26-2018 04:02 AM | |
7090 | 12-22-2017 09:18 AM | |
3538 | 12-05-2017 06:13 AM | |
3856 | 10-16-2017 07:55 AM | |
11226 | 10-04-2017 08:08 PM |
05-22-2015
06:13 AM
Backports are driven by support customer demand and importance. However Spark SQL is not supported now so I dont know if this would be back ported. You can however run whatever Spark you like on CDH, it is just that only the CDH build is supported.
... View more
05-22-2015
12:23 AM
1 Kudo
ALS: yes, fold-in just as before k-means: assign point to a cluster and update its centroid (but don't reassign any other points) RDF: assign point to leaf and update leaf's prediction (but don't change the rest of the tree)
... View more
05-21-2015
02:25 AM
The base name of the Spark release will always be a ".0" release. The CDH release tracks upstream maintenance releases although not exactly; some upstream fixes may end up in a CDH maintenance release earlier, or later. So, 5.4.1 already has some of 1.3.1.
... View more
05-13-2015
02:09 PM
Yes you add this role to a server just like with any other service/role in CM. Look at the Spark service. Spark Gateway is a "role" but not a server process, FWIW. Just means spark-submit et al can be run on that machine.
... View more
05-13-2015
01:53 PM
The History Server is part of the "Spark" service and is one of the roles you deploy through it. You don't have to configure it specially, but you can, including what port it's on. Normally you would not run a Spark master or worker at all, but just use YARN; I'd advise that. There are not other Spark services besides these 3.
... View more
05-11-2015
07:00 AM
Yes, well, I'd say that the batch layer can do "mini batch" if you simply use a low interval time. It's not a special case, really. I think this project isn't going to add its own data prep pipeline, no, but the idea is that you can use any Java or Spark-based libraries you like as part of your app. There's no need to have a different special set of support in this project.
... View more
05-11-2015
04:07 AM
The speed layer does indeed produce incremental updates. Like the serving layer, it loads the most recent model in memory and then computes how the model might change (approximately, rapidly) in response to new data, and internalizes and publishes those updates. The serving layers then hear the models but also the updates on the queue and update accordingly.
... View more
05-07-2015
10:20 PM
You can run what you like on your cluster, including Spark 1.3.1. However, it would not be supported, and you should not modify any of the CDH installation. You may need to take care with creating and maintaining your own config. CDH already includes critical fixes after 1.3.0 that went into 1.3.1, so I don't see much value in this.
... View more
05-01-2015
12:33 AM
Yes, it reads them at startup, so you would need to restart the processes.
... View more
04-29-2015
02:46 PM
1 Kudo
That doc lists patches that are *also* applied on top of 1.3.0. From your original message, it did not even look like you were affected by this issue, since your process did not run out of memory.
... View more