I'm doing a bacheloor degree under Informatic Engineer and in my Big Data lesson I will need to create a project using Spark. The Schumacher for my dataset ise from a set of Cinemas and movies:
- Customer_ID (identifier of a customer);
- Ticket _ID (identifier for a ticket to a movie);
- Freq_Movies (number of movies that customer see)
- Value_Movies (total money that Customer already used to buy tickets)
- Cinema_ID (the cinema where customer see the movie)
- Movie_ID (the movie that Customer see)
- Ticket _Value (purchase value of the movie ticket individually)
My teacher told us to create some approach to predict "something ". I was thinking in creating a Forecasting Time-Series using Spark but I am not getting any good approach to use it...
Can you give some help?
Depending on your level of comfort in stats/machine learning and amount of data you have, you can try several approaches. In brief, you would need decide what you are forecasting. Based on the information you gave us, you have two options:
- Total number of customers
- Total number of sales
Then, you would need to decide what are your features. If you are using total number of sales, you obviously can't use data about total number of customers for training your forecasting model, and vice-versa. Then, you'd need to aggregate features as described in StackOverflow page Timothy has provided -- you can chose to predict for day, week, month, or whatever else you think is appropriate.
If you want to keep it simple, just use ARIMA on it which you can find out of the box in Spark-Timeseries package. If you have a lot of data, you can try RNN with LSTM, but I believe that would be overkill.
A word of warning regarding Spark-TS:
- As stated by the author, the spark-ts library is no longer under active development by himself. Actually, the last activity was as of March 2017.
This poses doubts about continuity of this option.
Hello, for my bachelor degree I need to use Spark to forecast the hourly consume for the next day knowing the consume a building had for, let's say, 30 days. Is this possible to do in spark? Can someone give any help with this?