About Stewart12586

Stewart12586 · ‎05-15-2018

Hi experts, There exists any way to make a query to sys.tables like we do on T-SQL, like: SELECT *FROM sys.tables Is this possible in Impala or Hive? Thanks!

Stewart12586 · ‎10-10-2016

But when you are putting a file into HDFS, you're using a MapReduce job (even if you don't see)?

Stewart12586 · ‎10-10-2016

Hi experts, I have some basic questions about the relationship between MapReduce and HDFS: The Data File placing on HDFS is through MapReduce? All transactions in HDFS are using MapReduce jobs? Anyone knows the answer? Many thanks!

Stewart12586 · ‎09-29-2016

@jfrazee but I can define the cut-off the values that are higher than the average, right?

Stewart12586 · ‎09-29-2016

@jfrazee The normal is to reduce / remove products with little occurrence, right? It is reasonable to think about eliminating the products that appear only in 20% of all transactions?

Stewart12586 · ‎09-28-2016

Hi jfrazee, Many thanks for your response 🙂 I've some questions about this: 1) The structure of my data (Each line corresponds to a set of products_id) is correct to this algorithm? 2) The ".filter(_._2 > 2)" filter the products that have occurrence smaller than 2? 3) When I submit "val freqItemsets = transactions.map(_.split(",")).flatMap(xs => (xs.combinations(1) ++ xs.combinations(2) ++ xs.combinations(3) ++ xs.combinations(4) ++ xs.combinations(5)).filter(_.nonEmpty).filter(_._2 > 2).map(x => (x.toList, 1L)) ).reduceByKey(_ + _).map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}" I'm getting the following error: <console>:31: error: value _2 is not a member of Array[String] .Do you know how to solve it? Many thanks for your help and explaination about the association rules algorithm 🙂 And sorry for this questions.

Stewart12586 · ‎09-28-2016

Hi experts, I have attached to this post dataset sample.txt And I am trying to extract some association rules using Spark Mllib: val transactions = sc.textFile("DATA") import org.apache.spark.mllib.fpm.AssociationRules import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset val freqItemsets = transactions.map(_.split(",")).flatMap(xs => (xs.combinations(1) ++ xs.combinations(2) ++ xs.combinations(3) ++ xs.combinations(4) ++ xs.combinations(5)).filter(_.nonEmpty).map(x => (x.toList, 1L)) ).reduceByKey(_ + _).map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)} val ar = new AssociationRules().setMinConfidence(0.8) val results = ar.run(freqItemsets)results.collect().foreach { rule => println("[" + rule.antecedent.mkString(",") + "=>" + rule.consequent.mkString(",") + "]," + rule.confidence)} However, my code returns a dozen rules with confidence equal to 1 ... which makes little sense! Does anyone know if I lack some parameterization?

Stewart12586 · ‎09-25-2016

Hi experts, I've this line from a .txt which results from a Group Operator: 1;(7287026502032012,18);{(706)};{(101200010)};{(17286)};{(oz)};2.5 Basically I've 7 fields how can I obtain this: 1;7287026502032012,18;706;101200010;17286;oz;2.5 Many thanks!

Stewart12586 · ‎09-09-2016

Having this statement: Values = FILTER Input_Data BY Fields > 0 How to cont the number of records that was filtered and not? Many thanks!

Stewart12586 · ‎09-06-2016

Hi, Everytime that I run my Pig Script it generates a multiple files in HDFS (I never know the number). I need to do some anlytics using Spark. How can I join that multiple files to have only one file like: val data = sc.textFile("PATH/Filejoined"); Thanks!

Online	Offline
Last Visited	‎11-21-2018 01:05 PM

Member Since	‎06-18-2016 04:27 AM
Last Visited	‎11-21-2018 01:05 PM
Posts	52
Kudos received	14

Cloudera Community

Impala/Hive - Query sys.tables objects

Re: HDFS - MapReduce -> Basic Questions

HDFS - MapReduce -> Basic Questions

Re: Spark Mllib - Frequent Pattern Mining - stran...

Re: Spark Mllib - Frequent Pattern Mining - stran...

Re: Spark Mllib - Frequent Pattern Mining - stran...

Spark Mllib - Frequent Pattern Mining - strange r...

Using PIG Latin to replace multiple strings from s...

Count values that are filtered - Apache PIG

Spark Scala - Join multiple files using Spark