Member since
06-09-2016
34
Posts
2
Kudos Received
0
Solutions
10-02-2017
11:23 PM
Which type of costs should a enterprise have in order to have a license for Hadoop or Spark? Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
12-16-2016
02:27 AM
@Alex Woolford thanks :).do you recommend apply collective filtering with association rules? Imagine that I Predict that Customer A will buy product 1 and I discover that Who buys products 1 also buy product 2, so I can recommend to customer A the products 1 and 2...do you think that this a good approach?
... View more
12-16-2016
02:17 AM
@Dan Zaratsian many thanks for your response:) When you mean Predict purchase is like "person 1 will by products A and B"?
... View more
12-15-2016
10:54 PM
1 Kudo
Hi experts, Having a dataset with information about the customer and the products that they buy from a set of supermarkets: - Customer_ID - Transaction_Number - Store_ID - Product_Name - Transaction_Date - Product_Value Which supervisied algorithm did you recommend to create a use case in Spark 1.6? I'm very new at this topic 🙂 Many thanks!!
... View more
Labels:
- Labels:
-
Apache Spark
10-29-2016
01:47 PM
Hi experts, I've created a Script using Apache PIG to do some jobs on my data (that are from a text file). After my script I'm getting a big list of files ("part-m-001","part-m-002",...). What I'm asking is:
Using Impala is possible to concatenate all the data into one table? The data follows a structured schema so using Parquet Files is a good option?
Thanks!
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Pig
09-27-2016
09:33 AM
Hi experts, I want to rank my dataset but after/before I need to group my data. My dataset is: EMPLOYEE STOCK FURNISHER DATE VALUE A 2 AA 27-01-2016 3 A 1 AB 28-01-2016 3 B 4 AA 27-01-2016 5 C 5 AC 27-01-2016 1 C 2 AC 27-01-2016 4 Now I want to rank my data by Employee and Date and group them to obtain the sum of Value. I know that I can do this without ranking but it is a requirement the generation of the Rank by Employee and Date. Basically I want to extract the following output: ID EMPLOYEE STOCK FURNISHER DATE VALUE 1 A 2 AA 27-01-2016 3 2 A 1 AB 28-01-2016 3 3 B 4 AA 27-01-2016 5 4 C 5 AC 27-01-2016 5 4 C 2 AC 27-01-2016 5 To obtain this using Apache PIG I'm using this script: INPUT = LOAD 'FILE_PATH' USING PigStorage(';') as
(Employee:Chararray, STOCK:Int, FURNICHER:Chararray, Date:Chararray, Value:Double);
RANKING = rank DATA BY Employee,DATE;
GRP = GROUP RANKING BY FURNISHER;
DATA = FOREACH GRP_by_DATA GENERATE FLATTEN(RANKING);
STORE DATA INTO 'DESTINATION_PATH' USING PigStorage(','); But I'm not returning the desired output 😞 Anyone knows how can I do this? Many thanks!
... View more
Labels:
- Labels:
-
Apache Pig
09-19-2016
02:07 PM
Did you recommend any implementation of Apriori algorithm using Spark Mllib?
Any tutorial/use case that shows how the algorithm can be implemented using Spark Mllib?
Many thanks!
... View more
Labels:
- Labels:
-
Apache Spark
08-25-2016
09:29 PM
Hi people,
I've a dataset with the following schema:
ID_Employee - INT
Employee_Birth - DATE
Employee_Salary - DOUBLE
Quantity_Products - INT
And I want to find if my fields have outliers. I already read that a good practice is to use the Standard Deviation method, but in your opinion what's the best option to identify outliers or missing values?
Thanks!
... View more
Labels:
- Labels:
-
Apache Hive
08-23-2016
07:54 PM
1 Kudo
Hi,
There exists any tutorial/white paper available that explains how to implement the preditive model used to predict the story og "Diapers and Beer" on retail industry under Spark Mllib? I need to predict some values in retail industry using Spark and I don't have any reference/tutorial for this. And the project of "Diapers and Beer" have great similarities with my project and I would like to see how the project was implemented (like the code and the data model).
Many thanks 🙂
... View more
Labels:
- Labels:
-
Apache Spark