About m2014227

m2014227 · ‎10-02-2017

Which type of costs should a enterprise have in order to have a license for Hadoop or Spark? Thanks

m2014227 · ‎12-16-2016

@Alex Woolford thanks :).do you recommend apply collective filtering with association rules? Imagine that I Predict that Customer A will buy product 1 and I discover that Who buys products 1 also buy product 2, so I can recommend to customer A the products 1 and 2...do you think that this a good approach?

m2014227 · ‎12-16-2016

@Dan Zaratsian many thanks for your response:) When you mean Predict purchase is like "person 1 will by products A and B"?

m2014227 · ‎12-15-2016

Hi experts, Having a dataset with information about the customer and the products that they buy from a set of supermarkets: - Customer_ID - Transaction_Number - Store_ID - Product_Name - Transaction_Date - Product_Value Which supervisied algorithm did you recommend to create a use case in Spark 1.6? I'm very new at this topic 🙂 Many thanks!!

m2014227 · ‎10-29-2016

Hi experts, I've created a Script using Apache PIG to do some jobs on my data (that are from a text file). After my script I'm getting a big list of files ("part-m-001","part-m-002",...). What I'm asking is: Using Impala is possible to concatenate all the data into one table? The data follows a structured schema so using Parquet Files is a good option? Thanks!

m2014227 · ‎09-27-2016

Hi experts, I want to rank my dataset but after/before I need to group my data. My dataset is: EMPLOYEE STOCK FURNISHER DATE VALUE A 2 AA 27-01-2016 3 A 1 AB 28-01-2016 3 B 4 AA 27-01-2016 5 C 5 AC 27-01-2016 1 C 2 AC 27-01-2016 4 Now I want to rank my data by Employee and Date and group them to obtain the sum of Value. I know that I can do this without ranking but it is a requirement the generation of the Rank by Employee and Date. Basically I want to extract the following output: ID EMPLOYEE STOCK FURNISHER DATE VALUE 1 A 2 AA 27-01-2016 3 2 A 1 AB 28-01-2016 3 3 B 4 AA 27-01-2016 5 4 C 5 AC 27-01-2016 5 4 C 2 AC 27-01-2016 5 To obtain this using Apache PIG I'm using this script: INPUT = LOAD 'FILE_PATH' USING PigStorage(';') as (Employee:Chararray, STOCK:Int, FURNICHER:Chararray, Date:Chararray, Value:Double); RANKING = rank DATA BY Employee,DATE; GRP = GROUP RANKING BY FURNISHER; DATA = FOREACH GRP_by_DATA GENERATE FLATTEN(RANKING); STORE DATA INTO 'DESTINATION_PATH' USING PigStorage(','); But I'm not returning the desired output 😞 Anyone knows how can I do this? Many thanks!

m2014227 · ‎09-19-2016

Did you recommend any implementation of Apriori algorithm using Spark Mllib? Any tutorial/use case that shows how the algorithm can be implemented using Spark Mllib? Many thanks!

m2014227 · ‎08-25-2016

It's possible to use Hadoop to identify my dataset distribution?

m2014227 · ‎08-25-2016

Hi people, I've a dataset with the following schema: ID_Employee - INT Employee_Birth - DATE Employee_Salary - DOUBLE Quantity_Products - INT And I want to find if my fields have outliers. I already read that a good practice is to use the Standard Deviation method, but in your opinion what's the best option to identify outliers or missing values? Thanks!

m2014227 · ‎08-23-2016

Hi, There exists any tutorial/white paper available that explains how to implement the preditive model used to predict the story og "Diapers and Beer" on retail industry under Spark Mllib? I need to predict some values in retail industry using Spark and I don't have any reference/tutorial for this. And the project of "Diapers and Beer" have great similarities with my project and I would like to see how the project was implemented (like the code and the data model). Many thanks 🙂

Online	Offline
Last Visited	‎10-02-2017 11:23 PM

Member Since	‎06-09-2016 01:58 PM
Last Visited	‎10-02-2017 11:23 PM
Posts	34
Kudos received	2

Cloudera Community

Is Hadoop and Spark an Open-Source Tool?

Re: Groceries - Supersied algorithm proposal in Sp...

Re: Groceries - Supersied algorithm proposal in Sp...

Groceries - Supersied algorithm proposal in Spark ...

Impala -Pig Files - Parquet file?

Apache PIG - Ranking with group

Machine Learning - Apriori Algorithm - Spark Mllib

Re: Identify Outliers using Hive

Identify Outliers using Hive

"Diapers and Beer" project using Spark in Sandbox