About srowen

srowen · ‎05-21-2017

2.4.x should work, does it not? or do you need 0.10.1.1+ for compatibility with the 0.10.1.1 broker?

srowen · ‎05-16-2017

This is some error caused by your app, rather than a Spark issue. You need to find the executor logs from the app and see what happened.

srowen · ‎05-13-2017

Yes, been available for a while, but it's a separate parallel install so as to not replace Spark 1.x https://www.cloudera.com/downloads/spark2/2-1.html

srowen · ‎04-24-2017

(BTW I went ahead and made a 2.4.0 release to have something official and probably-working out there. It worked on my CDH 5.11 + Spark 2.1 + Kafka 0.10.0 cluster. Yes that's a minor problem in the log message. It should reference newYTYSolver. I'll fix that but it shouldn't otherwise affect anything.

srowen · ‎04-21-2017

Oh, now I see the same 'disconnected' problem you did. It turns out that Kafka 0.10.0 and 0.10.1 are not protocol-compatible, which is quite disappointing. So I think I'm going to have to back up and revert master/2.4 to Kafka 0.10.0, because that's the flavor that CDH is on and would like to avoid having two builds to support 0.10.0 vs 0.10.1. I hope that isn't a big deal to switch back in your prototype?

srowen · ‎04-20-2017

Yes, good catch. I'll track that at https://github.com/OryxProject/oryx/issues/329 and fix it in a few minutes.

srowen · ‎04-20-2017

Although I haven't tested anything like that, it's just using really standard APIs in straightforward ways, so, I'm not surprised if S3 just works because HDFS can read/write S3 OK. I know there are some gotchas with actually using S3 as intermediate storage in Spark jobs, but I think your EMR jobs are using local HDFS for that.

srowen · ‎04-20-2017

Yes, they're all only coupled by Kafka, so you could run these layers quite separately except that they need to share the brokers. It probably won't fit EMR's model as both should run concurrently, and, should run continuously. I'm not sure if it can help you with a shared Kafka either. Obviously it's also an option to run CDH on AWS if you want to try that. Serving layer does not _generally_ use HDFS unless the model is so big that Kafka can't represent parts of it. Then it will write to HDFS and read from it. This really isn't great but it's the best I could do for now for really large models. This is something that could be improved at some point, I hope. If you tune Kafka to allow very large models you can get away without HDFS access.

srowen · ‎04-20-2017

OK, is it largely working then? If it looks like the app is running, then I'll move to test 2.4 on my cluster too and if it looks good, go ahead and cut a release.

srowen · ‎04-19-2017

Hm, I don't recall seeing the 'disconnected' message. Is there more detail? On its face it seems like the serving layer can't see the broker? do some ports need to be opened?

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Oryx2 Kafka Broker Issue

Re: Spark2.1 in 5.11 can't start a yarn cluster jo...

Re: Does Spark 2 supported CDH 5.11.0 ?

Re: Oryx2 Kafka Broker Issue

Re: Oryx2 Kafka Broker Issue

Re: Oryx2 Kafka Broker Issue

Re: Oryx2 Kafka Broker Issue

Re: Oryx2 Kafka Broker Issue

Re: Oryx2 Kafka Broker Issue

Re: Oryx2 Kafka Broker Issue