Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Clustering data streams using R and Spark


Clustering data streams using R and Spark

New Contributor



I am new to Spark and I have to admit that I'm having a hard time since my backround is not IT/developer. Ι do my thesis for graduating a MSc in Statistics, which topic is "Clustering Data Streams" and I would like to do some applications, appart from the theoritical part. More specifically I would like to apply some clustering algorithms written in R on data streams via Spark. As far as I can understand from what I have read Spark has the MLlib and supports very few and specific clustering algorithms. The thing is that I would like to apply some more algorithms, which are not in that list. Would it be possible to write the R code of other clustering algorithms and apply them to data streams via Spark?


I would be really greatful if you could help me on this and give me some advise and hopefully any recomended material regarding "Spark" + "R" + "clustering data streams".


Thank you very much in advance.