About srowen

srowen · ‎05-22-2015

Backports are driven by support customer demand and importance. However Spark SQL is not supported now so I dont know if this would be back ported. You can however run whatever Spark you like on CDH, it is just that only the CDH build is supported.

srowen · ‎05-22-2015

ALS: yes, fold-in just as before k-means: assign point to a cluster and update its centroid (but don't reassign any other points) RDF: assign point to leaf and update leaf's prediction (but don't change the rest of the tree)

srowen · ‎05-21-2015

The base name of the Spark release will always be a ".0" release. The CDH release tracks upstream maintenance releases although not exactly; some upstream fixes may end up in a CDH maintenance release earlier, or later. So, 5.4.1 already has some of 1.3.1.

srowen · ‎05-13-2015

Yes you add this role to a server just like with any other service/role in CM. Look at the Spark service. Spark Gateway is a "role" but not a server process, FWIW. Just means spark-submit et al can be run on that machine.

srowen · ‎05-13-2015

The History Server is part of the "Spark" service and is one of the roles you deploy through it. You don't have to configure it specially, but you can, including what port it's on. Normally you would not run a Spark master or worker at all, but just use YARN; I'd advise that. There are not other Spark services besides these 3.

srowen · ‎05-11-2015

Yes, well, I'd say that the batch layer can do "mini batch" if you simply use a low interval time. It's not a special case, really. I think this project isn't going to add its own data prep pipeline, no, but the idea is that you can use any Java or Spark-based libraries you like as part of your app. There's no need to have a different special set of support in this project.

srowen · ‎05-11-2015

The speed layer does indeed produce incremental updates. Like the serving layer, it loads the most recent model in memory and then computes how the model might change (approximately, rapidly) in response to new data, and internalizes and publishes those updates. The serving layers then hear the models but also the updates on the queue and update accordingly.

srowen · ‎05-07-2015

You can run what you like on your cluster, including Spark 1.3.1. However, it would not be supported, and you should not modify any of the CDH installation. You may need to take care with creating and maintaining your own config. CDH already includes critical fixes after 1.3.0 that went into 1.3.1, so I don't see much value in this.

srowen · ‎05-01-2015

Yes, it reads them at startup, so you would need to restart the processes.

srowen · ‎04-29-2015

That doc lists patches that are *also* applied on top of 1.3.0. From your original message, it did not even look like you were affected by this issue, since your process did not run out of memory.

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: can i upgrade spark from 1.3.0 to 1.3.1 in CDH...

Re: Speed layer in Oryx2

Re: can i upgrade spark from 1.3.0 to 1.3.1 in CDH...

Re: Spark History Server: How to install/configure...

Re: Spark History Server: How to install/configure...

Re: Speed layer in Oryx2

Re: Speed layer in Oryx2

Re: can i upgrade spark from 1.3.0 to 1.3.1 in CDH...

Re: Run Oryx on a machine that is not part of the ...

Re: backport for SPARK-5967