Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Forecasting Time-Series Analysis - Spark

Forecasting Time-Series Analysis - Spark

Explorer

Hi experts,

I'm doing a bacheloor degree under Informatic Engineer and in my Big Data lesson I will need to create a project using Spark. The Schumacher for my dataset ise from a set of Cinemas and movies:

- Customer_ID (identifier of a customer);

- Ticket _ID (identifier for a ticket to a movie);

- Freq_Movies (number of movies that customer see)

- Value_Movies (total money that Customer already used to buy tickets)

- Cinema_ID (the cinema where customer see the movie)

- Movie_ID (the movie that Customer see)

- Ticket _Value (purchase value of the movie ticket individually)

- Year

- Quarter

- Month

- Day

- Day_Week

My teacher told us to create some approach to predict "something ". I was thinking in creating a Forecasting Time-Series using Spark but I am not getting any good approach to use it...

Can you give some help?

Many thanks!!!!

6 REPLIES 6
Highlighted

Re: Forecasting Time-Series Analysis - Spark

Super Guru
Highlighted

Re: Forecasting Time-Series Analysis - Spark

Explorer

@João Souza, to add more details to @Timothy Spann's answer.

Depending on your level of comfort in stats/machine learning and amount of data you have, you can try several approaches. In brief, you would need decide what you are forecasting. Based on the information you gave us, you have two options:

- Total number of customers

- Total number of sales

Then, you would need to decide what are your features. If you are using total number of sales, you obviously can't use data about total number of customers for training your forecasting model, and vice-versa. Then, you'd need to aggregate features as described in StackOverflow page Timothy has provided -- you can chose to predict for day, week, month, or whatever else you think is appropriate.

If you want to keep it simple, just use ARIMA on it which you can find out of the box in Spark-Timeseries package. If you have a lot of data, you can try RNN with LSTM, but I believe that would be overkill.

Highlighted

Re: Forecasting Time-Series Analysis - Spark

Rising Star

A word of warning regarding Spark-TS:

- As stated by the author, the spark-ts library is no longer under active development by himself. Actually, the last activity was as of March 2017.

This poses doubts about continuity of this option.

Highlighted

Re: Forecasting Time-Series Analysis - Spark

New Contributor

As Spark-TS is no longer under development - who knows - what are the alternatives in spark to run arima, holt-winters models etc for time-series forecasting?

Thanks in advance

Andrey

Highlighted

Re: Forecasting Time-Series Analysis - Spark

Super Guru
Highlighted

Re: Forecasting Time-Series Analysis - Spark

New Contributor

Hello, for my bachelor degree I need to use Spark to forecast the hourly consume for the next day knowing the consume a building had for, let's say, 30 days. Is this possible to do in spark? Can someone give any help with this?

Thanks!

Don't have an account?
Coming from Hortonworks? Activate your account here