01-24-2018 11:45 PM
I have a data set of 100 millions of records
metric value -1
I need to predict which hostname is using max of CPU.
which prediction model i can use, which tool or technique? can anyone suggest. thanks
01-25-2018 01:21 PM
The model you choose will be based on the number of labels (hostname) you are trying to predict on. Assuming your feature set will be timestamp, cpu/memory metric, country,.. You can start with something simple like KNN that is trained using Spark.
01-26-2018 03:31 PM
I would use Spark over Mahout. Mahout relies on MapReduce framework. Spark MLlib does not contain KNN algorithm but there are other suitable algorithms for your use case within that framework.