Member since
06-18-2016
52
Posts
14
Kudos Received
0
Solutions
05-15-2018
11:12 AM
Hi experts, There exists any way to make a query to sys.tables like we do on T-SQL, like: SELECT *FROM sys.tables
Is this possible in Impala or Hive? Thanks!
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
10-10-2016
09:34 PM
But when you are putting a file into HDFS, you're using a MapReduce job (even if you don't see)?
... View more
10-10-2016
09:27 PM
Hi experts,
I have some basic questions about the relationship between MapReduce and HDFS: The Data File placing on HDFS is through MapReduce? All transactions in HDFS are using MapReduce jobs? Anyone knows the answer?
Many thanks!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Cloudera DataFlow (CDF)
09-29-2016
04:34 PM
@jfrazee but I can define the cut-off the values that are higher than the average, right?
... View more
09-29-2016
09:37 AM
@jfrazee The normal is to reduce / remove products with little occurrence, right? It is reasonable to think about eliminating the products that appear only in 20% of all transactions?
... View more
09-28-2016
10:59 PM
Hi jfrazee,
Many thanks for your response 🙂 I've some questions about this:
1) The structure of my data (Each line corresponds to a set of products_id) is correct to this algorithm? 2) The ".filter(_._2 > 2)" filter the products that have occurrence smaller than 2? 3) When I submit "val freqItemsets = transactions.map(_.split(",")).flatMap(xs => (xs.combinations(1) ++ xs.combinations(2) ++ xs.combinations(3) ++ xs.combinations(4) ++ xs.combinations(5)).filter(_.nonEmpty).filter(_._2 > 2).map(x => (x.toList, 1L)) ).reduceByKey(_ + _).map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}" I'm getting the following error: <console>:31: error: value _2 is not a member of Array[String] .Do you know how to solve it?
Many thanks for your help and explaination about the association rules algorithm 🙂 And sorry for this questions.
... View more
09-28-2016
11:53 AM
Hi experts,
I have attached to this post dataset sample.txt And I am trying to extract some association rules using Spark Mllib:
val transactions = sc.textFile("DATA") import org.apache.spark.mllib.fpm.AssociationRules import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset val freqItemsets = transactions.map(_.split(",")).flatMap(xs => (xs.combinations(1) ++ xs.combinations(2) ++ xs.combinations(3) ++ xs.combinations(4) ++ xs.combinations(5)).filter(_.nonEmpty).map(x => (x.toList, 1L)) ).reduceByKey(_ + _).map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)} val ar = new AssociationRules().setMinConfidence(0.8) val results = ar.run(freqItemsets)results.collect().foreach { rule => println("[" + rule.antecedent.mkString(",") + "=>" + rule.consequent.mkString(",") + "]," + rule.confidence)} However, my code returns a dozen rules with confidence equal to 1 ... which makes little sense! Does anyone know if I lack some parameterization?
... View more
Labels:
- Labels:
-
Apache Spark
09-25-2016
02:54 PM
4 Kudos
Hi experts,
I've this line from a .txt which results from a Group Operator:
1;(7287026502032012,18);{(706)};{(101200010)};{(17286)};{(oz)};2.5
Basically I've 7 fields how can I obtain this:
1;7287026502032012,18;706;101200010;17286;oz;2.5
Many thanks!
... View more
Labels:
- Labels:
-
Apache Pig
09-09-2016
06:18 PM
3 Kudos
Having this statement:
Values = FILTER Input_Data BY Fields > 0
How to cont the number of records that was filtered and not?
Many thanks!
... View more
Labels:
- Labels:
-
Apache Pig
09-06-2016
01:03 PM
Hi,
Everytime that I run my Pig Script it generates a multiple files in HDFS (I never know the number). I need to do some anlytics using Spark.
How can I join that multiple files to have only one file like:
val data = sc.textFile("PATH/Filejoined");
Thanks!
... View more
Labels:
- Labels:
-
Apache Spark