Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2756 | 01-26-2018 04:02 AM | |
5786 | 12-22-2017 09:18 AM | |
2734 | 12-05-2017 06:13 AM | |
3039 | 10-16-2017 07:55 AM | |
8405 | 10-04-2017 08:08 PM |
11-17-2015
01:38 AM
Hi sowen We actually have the same problem of filed to bind with the internal ip-port 10.1.0.11:50321. The question is that sometimes connect Spark Streaming with the port and sometimes didn´t it and when the did not connect sometimes in 5 -7 min that Spark Streaming is trying connect. Do you know what is the possible cause? We install Spark 1.5.1 on clouderas 5.3.7 (with YARN). Best regards
... View more
10-26-2015
12:27 PM
So finally what version of CDH will Spark & Hive perfectly together?
... View more
10-20-2015
07:58 AM
Sowen, I am new to R and its configuration, i am working on a project where i need to import some data from hdfs(Cloudera CDH clister) on to my Windows running R/Rstudio environment. I had r/rstudio installed on windows and while trying to install rhdfs package which requires setting up of HADOOP_CMD environment variable pointing to hadoop binaries. My Hadoop cluster is running on Linux and any suggestions how i can set this HADOOP_CMD variable pointing to hadoop binaries on my windows running R environment? thank you!
... View more
10-14-2015
03:57 AM
It's tricky because in general the ALS implementation we are talking about is a special case compared to normal models, but it's a big special case. I think the general architecture is correct at the level it's presented. I don't want to complicate it too much. Your feedback is a value contribution. Problems and bug fixes are important, but also ways the architecture could be improved or opened up.
... View more
09-23-2015
08:52 AM
I had tried this but seemed to have some trouble in using things like pyspark, etc. - is there a gist or something somewhere with exact steps for CDH? I will try again and post what I did.
... View more
09-22-2015
04:05 AM
To be completely in control I often recommend to use a shading tool for libraries like this. Using maven shade or gradle shadow to make sure that your code references your version is a shure fire way to get this working. When you build your project you "shade" the references in your code which means it always uses your version. Wilfred
... View more
09-03-2015
03:26 PM
Timestamp is for ordering, and for determining decay of the strength factor. The ordering of events is not guaranteed by HDFS / Kafka, and does matter to some extent, especially if there are 'delete' events. It also matters when figuring out how old a data point is and how much its value has decayed, if it's enabled. You could use seconds or milliseconds, I suppose, if you used them consistently. However the serving layer uses a standard ms timestamp, so that's probably best to emulate.
... View more
09-02-2015
11:31 PM
Thanks a lot! The jar "mahout-1.0-collections.jar" was present in the class path. I've removed this jar and the job works without an error message. Best regards, butkiz!
... View more
09-01-2015
12:19 AM
I made a choice: Spark-JobServer. This project is almost done exactly in response to my needs, it allows to share RRD between applications as it shares a context. It supports Spark Sql/Hive contexts. And it is fully working without the need to install a new component on all cluster nodes 🙂
... View more
08-26-2015
01:07 PM
This may be too unspecific to be helpful, but I recall several JIRAs fixed for Spark 1.4 that concern the .inprogress files and history server. I expect that whatever this is could be related. If so, then the fix would be coming in 5.5 at the latest.
... View more