About srowen

cimox · ‎06-07-2016

Hi, thank you for quick reply. I am quite new in recommendation domain. What exactly mean latent features?

hadooptom · ‎05-26-2016

Thanks @hubbarja. Spent the afternoon trying this out on the CDH 5.7.0 QuickStart VM, with a kerberos enabled cluster and Cloudera Kafka 2.0.0. I think perhaps I didn't quite phrase my question clearly, but what I was trying to ask was whether the spark-streaming-kafka client would support consuming from a Kafka cluster that has client SSL authentication required enabled. For anyone else who tries this, the summary is it won't work due to upstream Spark issue [SPARK-12177], which deals with support for the new Kafka 0.9 consumer / producer API. SSL, SASL_PLAINTEXT or SASL_SSL connections to Kafka all require use of the new API. In fact, this issue is referenced in the known issues released with CDH 5.7.0, I just didn't spot it in time. There's a pull request which appears to support SSL (but no form of Kerberos client authentication) in Github here, if anyone feels brave. Looking at the comments on the Spark ticket, it's going to be at least post Spark 2.0.0 release that this feature gets merged in, and probably not until 2.1.0. Back to the drawing board for me!

srowen · ‎05-21-2016

Yes you will certainly need to provide access keys for S3 access to work. I don't think (?) that would be a solution to a VerifyError, which is a much lower-level error indicating corrupted builds. Yes, it's expected that AWS SDK dependencies were updated along with the new Spark version in CDH 5.7. I think the current version should depend on jets3t 0.9, which is the one you want.

ffmm · ‎04-29-2016

Thanks! Yes percent_rank() and window function together did the trick. A different way is to sort the column and pick the one that is in the middle. The results are close.

athtsang · ‎04-27-2016

That's it. Thanks.

ghandrisaleh · ‎04-25-2016

I have found the solution : var addedRDD : org.apache.spark.rdd.RDD[(String,Int)] = sc.emptyRDD

dnewberger · ‎04-20-2016

Thank you, now it makes a bit more sense.

Agrta · ‎01-13-2016

but installing CDH 5.5 using tarball will have the spark and other components of hadoop too? I installed CDH 5.5 using tarball without cloudera manager. But can not see any jar of spark or any other component. Pls suggest how can I make use of inbuilt components of hadoop in CDH

Gustave · ‎01-13-2016

Thanks 🙂

Sarthak · ‎11-21-2015

Hi, I am trying to schedule a spark job using cron. I have made a shell script and it executes well on the terminal. However, when I execute the script using cron it gives me insufficient memory to start JVM thread error. Every time I start the script using terminal there is no issue. This issue comes when the script starts with cron. Kindly if you could suggest something.

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Oryx ALS Collaborative filtering essentials

Re: Spark and Kafka broker with SSL (or Kerberos) ...

Re: CDH 5.7.0 Spark Streaming S3 Error

Re: calculating median on grouped data

Re: IllegalArgumentException: requirement failed: ...

Re: union rdd with emptyrdd

Re: What is considered a long running Spark Stream...

Re: spark: Exception in thread "main" java.lang.No...

Re: Oryx 2 with cloudera 5.3

Re: Scheduling Spark with Crontab