Member since
12-19-2016
149
Posts
15
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3954 | 04-04-2017 03:01 PM | |
1728 | 01-17-2017 10:44 AM |
02-09-2017
03:42 PM
I have 2 Dataframe and I would like to show the one of the dataframe if my conditions satishfied. I want to match the first column of both the DB and also the condition SEV_LVL='3'. Can I get some guidance or help please
scala> input_file.show()
+-----------+--------+-----+----+-------+
| ckt_id|location|usage|port|machine|
+-----------+--------+-----+----+-------+
| ckt_id|location|usage|port|machine|
| AXZCSD21DF| USA| 2GB| 101| MAC1|
| ABZCSD21DF| OTH| 4GB| 101| MAC2|
| AXZCSD21DF| USA| 6GB| 101| MAC4|
| BXZCSD21DF| USA| 7GB| 101| MAC6|
| CXZCSD21DF| IND| 2GB| 101| MAC9|
| AXZCSD21DF| USA| 1GB| 101| MAC0|
| AXZCSD22DF| IND| 9GB| 101| MAC3|
|ADZZCSD21DF| USA| 1GB| 101| MAC4|
| AXZCSD21DF| USA| 2GB| 101| MAC5|
| XZDCSD21DF| OTH| 2GB| 101| MAC1|
+-----------+--------+-----+----+-------+
scala> gsam.show()
+-----------+-------+
| CCKT_NO|SEV_LVL|
+-----------+-------+
| AXZCSD21DF| 1|
| BXZCSD21DF| 1|
| ABZCSD21DF| 3|
| CXZCSD21DF| 2|
| AXZCSD22DF| 2|
| XZDCSD21DF| 3|
|ADZZCSD21DF| 1|
+-----------+-------+
scala> val gsamjoin = gsam.join(input_file,(gsam("CCKT_NO") <=> input_file("ckt_id")));
gsamjoin: org.apache.spark.sql.DataFrame = [CCKT_NO: string, SEV_LVL: decimal(38,0), ckt_id: string, location: string, usage: string, port: string, machine: string]
scala> gsamjoin.show()
+-----------+-------+-----------+--------+-----+----+-------+
| CCKT_NO|SEV_LVL| ckt_id|location|usage|port|machine|
+-----------+-------+-----------+--------+-----+----+-------+
| CXZCSD21DF| 2| CXZCSD21DF| IND| 2GB| 101| MAC9|
| ABZCSD21DF| 3| ABZCSD21DF| OTH| 4GB| 101| MAC2|
| XZDCSD21DF| 3| XZDCSD21DF| OTH| 2GB| 101| MAC1|
| AXZCSD22DF| 2| AXZCSD22DF| IND| 9GB| 101| MAC3|
|ADZZCSD21DF| 1|ADZZCSD21DF| USA| 1GB| 101| MAC4|
| BXZCSD21DF| 1| BXZCSD21DF| USA| 7GB| 101| MAC6|
| AXZCSD21DF| 1| AXZCSD21DF| USA| 2GB| 101| MAC1|
| AXZCSD21DF| 1| AXZCSD21DF| USA| 6GB| 101| MAC4|
| AXZCSD21DF| 1| AXZCSD21DF| USA| 1GB| 101| MAC0|
| AXZCSD21DF| 1| AXZCSD21DF| USA| 2GB| 101| MAC5|
+-----------+-------+-----------+--------+-----+----+-------+
... View more
Labels:
- Labels:
-
Apache Spark
02-09-2017
03:35 PM
I have a textfile , which I am trying to convert it in to dataframe and do SQL query on it but seems its not working properly. Can someone please help me , I tried in both way by registering to temptable and without it as well. I need this to case class f1(
ckt_id:String,
location:String,
usage:String,
port:String,
machine:String
)
val input_file = sc.textFile("file:///dev/spark/input_file.txt").map(_.split("\\|")).map(x => f1(x(0).toString,x(1).toString,x(2).toString,x(3).toString,x(4).toString)).toDF
scala> input_file.show()
+-----------+--------+-----+----+-------+
| ckt_id|location|usage|port|machine|
+-----------+--------+-----+----+-------+
| ckt_id|location|usage|port|machine|
| AXZCSD21DF| USA| 2GB| 101| MAC1|
| ABZCSD21DF| OTH| 4GB| 101| MAC2|
| AXZCSD21DF| USA| 6GB| 101| MAC4|
| BXZCSD21DF| USA| 7GB| 101| MAC6|
| CXZCSD21DF| IND| 2GB| 101| MAC9|
| AXZCSD21DF| USA| 1GB| 101| MAC0|
| AXZCSD22DF| IND| 9GB| 101| MAC3|
|ADZZCSD21DF| USA| 1GB| 101| MAC4|
| AXZCSD21DF| USA| 2GB| 101| MAC5|
| XZDCSD21DF| OTH| 2GB| 101| MAC1|
+-----------+--------+-----+----+-------+
scala> input_file.printSchema()
root
|-- ckt_id: string (nullable = true)
|-- location: string (nullable = true)
|-- usage: string (nullable = true)
|-- port: string (nullable = true)
|-- machine: string (nullable = true)
scala> input_file.registerTempTable("input_file")
ERROR:
scala> val op =sqlContext.sql("SELECT * FROM input_file")
org.apache.spark.sql.AnalysisException: Table not found: input_file;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:305)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:314)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:309)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:128)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:133)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:135)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:137)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:139)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:141)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:143)
... View more
Labels:
- Labels:
-
Apache Spark
02-09-2017
12:38 PM
Hi Frank lu Can you provide me the code with the example i didnt exactly what you are saying
... View more
02-09-2017
11:46 AM
@Binu Mathew While doing sql operation am getting one of the table is not found. scala> gsam.show()
+-----------+-------+
| CCKT_NO|SEV_LVL|
+-----------+-------+
| AXZCSD21DF| 1|
| BXZCSD21DF| 1|
| ABZCSD21DF| 3|
| CXZCSD21DF| 2|
| AXZCSD22DF| 2|
| XZDCSD21DF| 3|
|ADZZCSD21DF| 1|
+-----------+-------+
scala> input_file.show()
+-----------+--------+-----+----+-------+
| ckt_id|location|usage|port|machine|
+-----------+--------+-----+----+-------+
| ckt_id|location|usage|port|machine|
| AXZCSD21DF| USA| 2GB| 101| MAC1|
| ABZCSD21DF| OTH| 4GB| 101| MAC2|
| AXZCSD21DF| USA| 6GB| 101| MAC4|
| BXZCSD21DF| USA| 7GB| 101| MAC6|
| CXZCSD21DF| IND| 2GB| 101| MAC9|
| AXZCSD21DF| USA| 1GB| 101| MAC0|
| AXZCSD22DF| IND| 9GB| 101| MAC3|
|ADZZCSD21DF| USA| 1GB| 101| MAC4|
| AXZCSD21DF| USA| 2GB| 101| MAC5|
| XZDCSD21DF| OTH| 2GB| 101| MAC1|
+-----------+--------+-----+----+-------+
scala> input_file.registerTempTable("input_file_temp")
scala> gsam.registerTempTable("gsam_temp")
scala> val tmp = sqlContext.sql("select a.ckt_id,a.location,a.usage,a.port,a.machine,b.CCKT_NO,b.SEV_LVL FROM input_file_temp a, gsam_temp b where a.ckt_id=b.CCKT_NO AND b.sev_lvl='3'")
org.apache.spark.sql.AnalysisException: Table not found: input_file_temp;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:305)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:314)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:309)
... View more
02-09-2017
08:53 AM
Thank you so much Sir. You are awesome..
... View more
02-09-2017
08:30 AM
Thank you so much. You're Genius 🙂
... View more
02-08-2017
04:22 PM
Using Spark Can I compare 2 different data ( one from Sql DB and another from textfile ) I have two sets of data. One is text file and another is SQL table. I would like to do a look up in to data presented in SQL table and text file and if they match, I want to delete some fields from the textfile.
Text File :
ckt_id|location|usage|port|machine
AXZCSD21DF|USA|2GB|101|MAC1
ABZCSD21DF|OTH|4GB|101|MAC2
AXZCSD21DF|USA|6GB|101|MAC4
BXZCSD21DF|USA|7GB|101|MAC6
SQL table:
+-----------+-------+
| CCKT_NO|SEV_LVL|
+-----------+-------+
| AXZCSD21DF| 1|
| BXZCSD21DF| 1|
| ABZCSD21DF| 3|
| CXZCSD21DF| 2|
| AXZCSD22DF| 2|
| XZDCSD21DF| 3|
|ADZZCSD21DF| 1|
+-----------+-------+
Can Someone please guide me on this ?
... View more
Labels:
- Labels:
-
Apache Spark
02-08-2017
02:59 PM
How can I load a complete table to an RDD using Spark.
... View more
Labels:
- Labels:
-
Apache Spark
02-07-2017
10:31 AM
i want to delete the older partitons data more than 10 days. my script runs everyday. iam able to truncate the partitioned managed table with the current date but unable to truncate more than 10 days data from current date. Can anyone provide me the command to truncate the date with date a partitioned column for more than 10 days
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Pig
01-31-2017
01:16 PM
Thank you Sir, much appreciated 🙂
... View more