I m using spark cluster (6 nodes 128 GB RAM 1 TB hard disk) in standalone mode for predicting the customer behavior in an organisation. The model creation process for an organisation takes 20 minutes and it takes place for every 30 days for an organisation. Any ideas or views how to implement this process on a scale of 50,000 organisation. Whether to use spark or any other means as spark does not support multiple model creation for multiple organisations simultaneously.
Thanks in Advance
First of all, are you using all the resources of your cluster? ie. is your Spark application using all the resources? If so, are they really used by your Spark process? If no, you can scale horizontally, launching the m,odel creatin for multiple organization in the same time... If the resources are not enough for you, you can always scale your cluster size...
Model creation for an organisation runs as a single application in the spark cluster. How can run multiple application simultaneously in the same spark context in the spark cluster? In spark cluster, I have allotted all the resource to the spark context but I don't have control over how the executors are used for the spark application as I am running in standalone mode. Some executors are free at some time.Can this be used or I have to change the configuration ?
You can run multiple spark applications simultaneously if your cluster has enough resources. If you are not exploiting all the resources you have allocated you should just reduce the allocated resources. In this way you can run multiple applications.
So for only one application runs at a time in the spark context and the other application starts after the completion of previous one. How to create multiple spark application in the one spark context in the standalone mode? By reducing the resource allocated to spark context can I create new spark context in the master through the same driver. If so how to do it? What configuration changes I have to make to the spark context?