Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

choose data base for zeppelin visualization tasks

choose data base for zeppelin visualization tasks

New Contributor

Hello Team!

I m working with millions of csv separate files. Every 5 minutes I receive 4 csv files I developed spark job for transform these 4 files to mongodb documents(the job execute each 5 minutes). I m using Zepplin for data discovery and explorations tasks based on spark interpreter and mongodb spark connector it works well but the problem is for 10 days of data in mogodb collection (purge the first day and add the actual one) and with 48 gb RAM 12 cpu it's slow, actually I want a 30 historical days this task will be impossible so I m reflecting to replace mongodb and store the result of transforming csv files into hdfs in the json format. I don't know if this solution will give me a better performance(speed and memory)?
any suggestions, please

Thank u !

Don't have an account?
Coming from Hortonworks? Activate your account here