Reply
Explorer
Posts: 40
Registered: ‎08-29-2017

Which predictive modelling we can use for predicting Terra bytes of Data which is in Hive

Hi

I have a data set of 100 millions of records

timestamp,hostname,country,cpu/memory,metric value

2017-12-01 06:35:57.0wkliunhjjlcpu

metric value -1

I need to predict which hostname is using max of CPU.

which prediction model i can use, which tool or technique? can anyone suggest. thanks

 
Highlighted
Contributor
Posts: 29
Registered: ‎03-07-2017

Re: Which predictive modelling we can use for predicting Terra bytes of Data which is in Hive

The model you choose will be based on the number of labels (hostname) you are trying to predict on. Assuming your feature set will be timestamp, cpu/memory metric, country,.. You can start with something simple like KNN that is trained using Spark.

Explorer
Posts: 40
Registered: ‎08-29-2017

Re: Which predictive modelling we can use for predicting Terra bytes of Data which is in Hive

Hi,

 

Thanks for the valuable suggestion

 

Spark using any Machine Learning Libraray or Mahout 

Contributor
Posts: 29
Registered: ‎03-07-2017

Re: Which predictive modelling we can use for predicting Terra bytes of Data which is in Hive

I would use Spark over Mahout. Mahout relies on MapReduce framework. Spark MLlib does not contain KNN algorithm but there are other suitable algorithms for your use case within that framework. 

Announcements