About srowen

CDHJOSE · ‎11-17-2015

Hi sowen We actually have the same problem of filed to bind with the internal ip-port 10.1.0.11:50321. The question is that sometimes connect Spark Streaming with the port and sometimes didn´t it and when the did not connect sometimes in 5 -7 min that Spark Streaming is trying connect. Do you know what is the possible cause? We install Spark 1.5.1 on clouderas 5.3.7 (with YARN). Best regards

diwan.chirayu1367174791 · ‎10-26-2015

So finally what version of CDH will Spark & Hive perfectly together?

Poodah · ‎10-20-2015

Sowen, I am new to R and its configuration, i am working on a project where i need to import some data from hdfs(Cloudera CDH clister) on to my Windows running R/Rstudio environment. I had r/rstudio installed on windows and while trying to install rhdfs package which requires setting up of HADOOP_CMD environment variable pointing to hadoop binaries. My Hadoop cluster is running on Linux and any suggestions how i can set this HADOOP_CMD variable pointing to hadoop binaries on my windows running R environment? thank you!

srowen · ‎10-14-2015

It's tricky because in general the ALS implementation we are talking about is a special case compared to normal models, but it's a big special case. I think the general architecture is correct at the level it's presented. I don't want to complicate it too much. Your feedback is a value contribution. Problems and bug fixes are important, but also ways the architecture could be improved or opened up.

cerd · ‎09-23-2015

I had tried this but seemed to have some trouble in using things like pyspark, etc. - is there a gist or something somewhere with exact steps for CDH? I will try again and post what I did.

Wilfred · ‎09-22-2015

To be completely in control I often recommend to use a shading tool for libraries like this. Using maven shade or gradle shadow to make sure that your code references your version is a shure fire way to get this working. When you build your project you "shade" the references in your code which means it always uses your version. Wilfred

srowen · ‎09-03-2015

Timestamp is for ordering, and for determining decay of the strength factor. The ordering of events is not guaranteed by HDFS / Kafka, and does matter to some extent, especially if there are 'delete' events. It also matters when figuring out how old a data point is and how much its value has decayed, if it's enabled. You could use seconds or milliseconds, I suppose, if you used them consistently. However the serving layer uses a standard ms timestamp, so that's probably best to emulate.

butkiz · ‎09-02-2015

Thanks a lot! The jar "mahout-1.0-collections.jar" was present in the class path. I've removed this jar and the job works without an error message. Best regards, butkiz!

Grg · ‎09-01-2015

I made a choice: Spark-JobServer. This project is almost done exactly in response to my needs, it allows to share RRD between applications as it shares a context. It supports Spark Sql/Hive contexts. And it is fully working without the need to install a new component on all cluster nodes 🙂

srowen · ‎08-26-2015

This may be too unspecific to be helpful, but I recall several JIRAs fixed for Spark 1.4 that concern the .inprogress files and history server. I expect that whatever this is could be related. If so, then the fix would be coming in 5.5 at the latest.

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Spark Streaming Fails on Cluster mode ( Flume ...

Re: spark sql is not working on CDH5.3

Re: RHadoop on CDH5.3, rmr2 and rhdfs packages ar...

Re: How Speed Layer Internalizes Updates..!!

Re: Which CDH release will include Spark 1.4.x?

Re: spark-submit on additional machine

Re: Overall questions about Oryx 2

Re: If running a Mahout DistributedLanczosSolver J...

Re: Share 1 RDD between 2 Spark applications (memo...

Re: Spark Application history not found Applicatio...