Created 02-09-2017 03:42 PM
I have 2 Dataframe and I would like to show the one of the dataframe if my conditions satishfied. I want to match the first column of both the DB and also the condition SEV_LVL='3'. Can I get some guidance or help please
scala> input_file.show() +-----------+--------+-----+----+-------+ | ckt_id|location|usage|port|machine| +-----------+--------+-----+----+-------+ | ckt_id|location|usage|port|machine| | AXZCSD21DF| USA| 2GB| 101| MAC1| | ABZCSD21DF| OTH| 4GB| 101| MAC2| | AXZCSD21DF| USA| 6GB| 101| MAC4| | BXZCSD21DF| USA| 7GB| 101| MAC6| | CXZCSD21DF| IND| 2GB| 101| MAC9| | AXZCSD21DF| USA| 1GB| 101| MAC0| | AXZCSD22DF| IND| 9GB| 101| MAC3| |ADZZCSD21DF| USA| 1GB| 101| MAC4| | AXZCSD21DF| USA| 2GB| 101| MAC5| | XZDCSD21DF| OTH| 2GB| 101| MAC1| +-----------+--------+-----+----+-------+ scala> gsam.show() +-----------+-------+ | CCKT_NO|SEV_LVL| +-----------+-------+ | AXZCSD21DF| 1| | BXZCSD21DF| 1| | ABZCSD21DF| 3| | CXZCSD21DF| 2| | AXZCSD22DF| 2| | XZDCSD21DF| 3| |ADZZCSD21DF| 1| +-----------+-------+ scala> val gsamjoin = gsam.join(input_file,(gsam("CCKT_NO") <=> input_file("ckt_id"))); gsamjoin: org.apache.spark.sql.DataFrame = [CCKT_NO: string, SEV_LVL: decimal(38,0), ckt_id: string, location: string, usage: string, port: string, machine: string] scala> gsamjoin.show() +-----------+-------+-----------+--------+-----+----+-------+ | CCKT_NO|SEV_LVL| ckt_id|location|usage|port|machine| +-----------+-------+-----------+--------+-----+----+-------+ | CXZCSD21DF| 2| CXZCSD21DF| IND| 2GB| 101| MAC9| | ABZCSD21DF| 3| ABZCSD21DF| OTH| 4GB| 101| MAC2| | XZDCSD21DF| 3| XZDCSD21DF| OTH| 2GB| 101| MAC1| | AXZCSD22DF| 2| AXZCSD22DF| IND| 9GB| 101| MAC3| |ADZZCSD21DF| 1|ADZZCSD21DF| USA| 1GB| 101| MAC4| | BXZCSD21DF| 1| BXZCSD21DF| USA| 7GB| 101| MAC6| | AXZCSD21DF| 1| AXZCSD21DF| USA| 2GB| 101| MAC1| | AXZCSD21DF| 1| AXZCSD21DF| USA| 6GB| 101| MAC4| | AXZCSD21DF| 1| AXZCSD21DF| USA| 1GB| 101| MAC0| | AXZCSD21DF| 1| AXZCSD21DF| USA| 2GB| 101| MAC5| +-----------+-------+-----------+--------+-----+----+-------+
Created 02-10-2017 12:35 AM
Definitely possible! Here is some sample code:
gsam.join(input_file, (gsam("CCKT_NO")===input_file("ckt_id")) && (gsam("SEV_LVL") === 3)), "inner")
Notice the double && sign. You can put as many conditions as you'd like in.
Created 02-10-2017 12:35 AM
Definitely possible! Here is some sample code:
gsam.join(input_file, (gsam("CCKT_NO")===input_file("ckt_id")) && (gsam("SEV_LVL") === 3)), "inner")
Notice the double && sign. You can put as many conditions as you'd like in.
Created 02-10-2017 12:00 PM
Thank you Sir, But I think if we do join for a larger dataset memory issues will happen. So in such case can we use if/else or look up function here .
My Aim is to match input_file DFwith gsam DF and if CCKT_NO = ckt_id and SEV_LVL = 3 then print complete row for that ckt_id.