Welcome to the Cloudera Community

bkontonis · ‎03-26-2016

Hello,

I am new to Spark and I have to admit that I'm having a hard time since my backround is not IT/developer. Ι do my thesis for graduating a MSc in Statistics, which topic is "Clustering Data Streams" and I would like to do some applications, appart from the theoritical part. More specifically I would like to apply some clustering algorithms written in R on data streams via Spark. As far as I can understand from what I have read Spark has the MLlib and supports very few and specific clustering algorithms. The thing is that I would like to apply some more algorithms, which are not in that list. Would it be possible to write the R code of other clustering algorithms and apply them to data streams via Spark?

I would be really greatful if you could help me on this and give me some advise and hopefully any recomended material regarding "Spark" + "R" + "clustering data streams".

Thank you very much in advance.

Cloudera Community

Welcome to the Cloudera Community

Who agreed with this topic

Clustering data streams using R and Spark