About srowen

srowen · ‎11-12-2017

Nothing about a cluster would prevent it from making external connections, but your firewall rules might. The variabbles you export here are not related to Spark. It's an error from the library you're using.

srowen · ‎10-25-2017

Oops I meant to write S3 paths. Really, it's Hadoop and its APIs that supports / doesn't support S3. It should be built in to Hadoop distributions however. I believe you might need an s3a:// protocol instead of s3://

srowen · ‎10-25-2017

Even I have forgotten exactly how it works off the top of my head, but yes, you are correct that you should be able to use HDFS paths. Yes it runs on Java 7 -- or 8, I believe, though I don't recall if that was tested. It doesn't require Java 8.

srowen · ‎10-16-2017

Really, this is just saying you can upload data at project creation time or later from your local computer to the local file system that the Python/R/Scala sessions see in their local file system. Those jobs then see those local files as simple files, and can do what they like with them. But you can also within the same program access whatever data you want, anywhere you want; you just need to write code that does so. Via Spark or whatever library you want you can also access whatever data sources you want, as well. There is no either/or here.

srowen · ‎10-06-2017

Are you looking for the .jar files that were produced as part of the release? those are still in the repo and will stay there indefinitely as far as I know, just because it could be part of people's builds: https://repository.cloudera.com/artifactory/cloudera-repos/com/cloudera/oryx/

srowen · ‎10-04-2017

Oh, I forgot, we have made many obsolete repos in github.com/cloudera private. I can still see it but of course you can't. Here's a tarball of the final release: https://drive.google.com/open?id=0B_hfrkaWlLi4MVlxQWVJaVd0ZGs If there's any significant demand, I could revive the repo in my personal account

srowen · ‎10-04-2017

That implementation is obsolete at this point, I'd say, but sure you're welcome to go dig it out. It worked well. The releases and source are still on the 1.x project site: https://github.com/cloudera/oryx/releases

srowen · ‎07-23-2017

CDH already supports Spark 2.2, right?

srowen · ‎07-22-2017

Unsupported != doesn't work. Spark Streaming is shipped as-is and you can use structured streaming. The distro wouldn't include breaking changes to public APIs even where not supported.

srowen · ‎07-21-2017

Yes you have two services for the history servers. Yes you need to build your app vs Spark 1 or Spark 2 and then run with the right version.

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: spark connecting salesforce error

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Having Spark 1.6.0 and 2.1 in the same CDH

Re: Having Spark 1.6.0 and 2.1 in the same CDH

Re: Having Spark 1.6.0 and 2.1 in the same CDH