About das_dineshk

das_dineshk · ‎02-09-2017

I have 2 Dataframe and I would like to show the one of the dataframe if my conditions satishfied. I want to match the first column of both the DB and also the condition SEV_LVL='3'. Can I get some guidance or help please scala> input_file.show() +-----------+--------+-----+----+-------+ | ckt_id|location|usage|port|machine| +-----------+--------+-----+----+-------+ | ckt_id|location|usage|port|machine| | AXZCSD21DF| USA| 2GB| 101| MAC1| | ABZCSD21DF| OTH| 4GB| 101| MAC2| | AXZCSD21DF| USA| 6GB| 101| MAC4| | BXZCSD21DF| USA| 7GB| 101| MAC6| | CXZCSD21DF| IND| 2GB| 101| MAC9| | AXZCSD21DF| USA| 1GB| 101| MAC0| | AXZCSD22DF| IND| 9GB| 101| MAC3| |ADZZCSD21DF| USA| 1GB| 101| MAC4| | AXZCSD21DF| USA| 2GB| 101| MAC5| | XZDCSD21DF| OTH| 2GB| 101| MAC1| +-----------+--------+-----+----+-------+ scala> gsam.show() +-----------+-------+ | CCKT_NO|SEV_LVL| +-----------+-------+ | AXZCSD21DF| 1| | BXZCSD21DF| 1| | ABZCSD21DF| 3| | CXZCSD21DF| 2| | AXZCSD22DF| 2| | XZDCSD21DF| 3| |ADZZCSD21DF| 1| +-----------+-------+ scala> val gsamjoin = gsam.join(input_file,(gsam("CCKT_NO") <=> input_file("ckt_id"))); gsamjoin: org.apache.spark.sql.DataFrame = [CCKT_NO: string, SEV_LVL: decimal(38,0), ckt_id: string, location: string, usage: string, port: string, machine: string] scala> gsamjoin.show() +-----------+-------+-----------+--------+-----+----+-------+ | CCKT_NO|SEV_LVL| ckt_id|location|usage|port|machine| +-----------+-------+-----------+--------+-----+----+-------+ | CXZCSD21DF| 2| CXZCSD21DF| IND| 2GB| 101| MAC9| | ABZCSD21DF| 3| ABZCSD21DF| OTH| 4GB| 101| MAC2| | XZDCSD21DF| 3| XZDCSD21DF| OTH| 2GB| 101| MAC1| | AXZCSD22DF| 2| AXZCSD22DF| IND| 9GB| 101| MAC3| |ADZZCSD21DF| 1|ADZZCSD21DF| USA| 1GB| 101| MAC4| | BXZCSD21DF| 1| BXZCSD21DF| USA| 7GB| 101| MAC6| | AXZCSD21DF| 1| AXZCSD21DF| USA| 2GB| 101| MAC1| | AXZCSD21DF| 1| AXZCSD21DF| USA| 6GB| 101| MAC4| | AXZCSD21DF| 1| AXZCSD21DF| USA| 1GB| 101| MAC0| | AXZCSD21DF| 1| AXZCSD21DF| USA| 2GB| 101| MAC5| +-----------+-------+-----------+--------+-----+----+-------+

das_dineshk · ‎02-09-2017

I have a textfile , which I am trying to convert it in to dataframe and do SQL query on it but seems its not working properly. Can someone please help me , I tried in both way by registering to temptable and without it as well. I need this to case class f1( ckt_id:String, location:String, usage:String, port:String, machine:String ) val input_file = sc.textFile("file:///dev/spark/input_file.txt").map(_.split("\\|")).map(x => f1(x(0).toString,x(1).toString,x(2).toString,x(3).toString,x(4).toString)).toDF scala> input_file.show() +-----------+--------+-----+----+-------+ | ckt_id|location|usage|port|machine| +-----------+--------+-----+----+-------+ | ckt_id|location|usage|port|machine| | AXZCSD21DF| USA| 2GB| 101| MAC1| | ABZCSD21DF| OTH| 4GB| 101| MAC2| | AXZCSD21DF| USA| 6GB| 101| MAC4| | BXZCSD21DF| USA| 7GB| 101| MAC6| | CXZCSD21DF| IND| 2GB| 101| MAC9| | AXZCSD21DF| USA| 1GB| 101| MAC0| | AXZCSD22DF| IND| 9GB| 101| MAC3| |ADZZCSD21DF| USA| 1GB| 101| MAC4| | AXZCSD21DF| USA| 2GB| 101| MAC5| | XZDCSD21DF| OTH| 2GB| 101| MAC1| +-----------+--------+-----+----+-------+ scala> input_file.printSchema() root |-- ckt_id: string (nullable = true) |-- location: string (nullable = true) |-- usage: string (nullable = true) |-- port: string (nullable = true) |-- machine: string (nullable = true) scala> input_file.registerTempTable("input_file") ERROR: scala> val op =sqlContext.sql("SELECT * FROM input_file") org.apache.spark.sql.AnalysisException: Table not found: input_file; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:305) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:314) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:309) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:128) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:133) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:135) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:137) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:139) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:141) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$$$$c57ec8bf9b0d5f6161b97741d596ff0$$$$wC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:143)

das_dineshk · ‎02-09-2017

Hi Frank lu Can you provide me the code with the example i didnt exactly what you are saying

das_dineshk · ‎02-09-2017

@Binu Mathew While doing sql operation am getting one of the table is not found. scala> gsam.show() +-----------+-------+ | CCKT_NO|SEV_LVL| +-----------+-------+ | AXZCSD21DF| 1| | BXZCSD21DF| 1| | ABZCSD21DF| 3| | CXZCSD21DF| 2| | AXZCSD22DF| 2| | XZDCSD21DF| 3| |ADZZCSD21DF| 1| +-----------+-------+ scala> input_file.show() +-----------+--------+-----+----+-------+ | ckt_id|location|usage|port|machine| +-----------+--------+-----+----+-------+ | ckt_id|location|usage|port|machine| | AXZCSD21DF| USA| 2GB| 101| MAC1| | ABZCSD21DF| OTH| 4GB| 101| MAC2| | AXZCSD21DF| USA| 6GB| 101| MAC4| | BXZCSD21DF| USA| 7GB| 101| MAC6| | CXZCSD21DF| IND| 2GB| 101| MAC9| | AXZCSD21DF| USA| 1GB| 101| MAC0| | AXZCSD22DF| IND| 9GB| 101| MAC3| |ADZZCSD21DF| USA| 1GB| 101| MAC4| | AXZCSD21DF| USA| 2GB| 101| MAC5| | XZDCSD21DF| OTH| 2GB| 101| MAC1| +-----------+--------+-----+----+-------+ scala> input_file.registerTempTable("input_file_temp") scala> gsam.registerTempTable("gsam_temp") scala> val tmp = sqlContext.sql("select a.ckt_id,a.location,a.usage,a.port,a.machine,b.CCKT_NO,b.SEV_LVL FROM input_file_temp a, gsam_temp b where a.ckt_id=b.CCKT_NO AND b.sev_lvl='3'") org.apache.spark.sql.AnalysisException: Table not found: input_file_temp; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:305) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:314) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:309)

das_dineshk · ‎02-09-2017

Thank you so much Sir. You are awesome..

das_dineshk · ‎02-09-2017

Thank you so much. You're Genius 🙂

das_dineshk · ‎02-08-2017

Using Spark Can I compare 2 different data ( one from Sql DB and another from textfile ) I have two sets of data. One is text file and another is SQL table. I would like to do a look up in to data presented in SQL table and text file and if they match, I want to delete some fields from the textfile. Text File : ckt_id|location|usage|port|machine AXZCSD21DF|USA|2GB|101|MAC1 ABZCSD21DF|OTH|4GB|101|MAC2 AXZCSD21DF|USA|6GB|101|MAC4 BXZCSD21DF|USA|7GB|101|MAC6 SQL table: +-----------+-------+ | CCKT_NO|SEV_LVL| +-----------+-------+ | AXZCSD21DF| 1| | BXZCSD21DF| 1| | ABZCSD21DF| 3| | CXZCSD21DF| 2| | AXZCSD22DF| 2| | XZDCSD21DF| 3| |ADZZCSD21DF| 1| +-----------+-------+ Can Someone please guide me on this ?

das_dineshk · ‎02-08-2017

How can I load a complete table to an RDD using Spark.

das_dineshk · ‎02-07-2017

i want to delete the older partitons data more than 10 days. my script runs everyday. iam able to truncate the partitioned managed table with the current date but unable to truncate more than 10 days data from current date. Can anyone provide me the command to truncate the date with date a partitioned column for more than 10 days

das_dineshk · ‎01-31-2017

Thank you Sir, much appreciated 🙂

Online	Offline
Last Visited	‎08-13-2020 10:35 PM

Member Since	‎12-19-2016 01:09 PM
Last Visited	‎08-13-2020 10:35 PM
Posts	149
Kudos received	15

Cloudera Community

Re: Unable to install/start docker.

Re: how do I copy a file from aws ec2 instance to...

Can I join 2 dataframe with condition in column va...

org.apache.spark.sql.AnalysisException: Table not ...

Re: how can i delete older partitions data in hive

Re: Comparing sql data with text file using Spark.

Re: Comparing sql data with text file using Spark.

Re: Comparing sql data with text file using Spark.

Comparing sql data with text file using Spark.

Load MYSQL table in to RDD

how can i delete older partitions data in hive

Re: Need help on user and its access in HDFS for H...