Created 02-09-2017 03:42 PM
I have 2 Dataframe and I would like to show the one of the dataframe if my conditions satishfied. I want to match the first column of both the DB and also the condition SEV_LVL='3'. Can I get some guidance or help please
scala> input_file.show()
+-----------+--------+-----+----+-------+
| ckt_id|location|usage|port|machine|
+-----------+--------+-----+----+-------+
| ckt_id|location|usage|port|machine|
| AXZCSD21DF| USA| 2GB| 101| MAC1|
| ABZCSD21DF| OTH| 4GB| 101| MAC2|
| AXZCSD21DF| USA| 6GB| 101| MAC4|
| BXZCSD21DF| USA| 7GB| 101| MAC6|
| CXZCSD21DF| IND| 2GB| 101| MAC9|
| AXZCSD21DF| USA| 1GB| 101| MAC0|
| AXZCSD22DF| IND| 9GB| 101| MAC3|
|ADZZCSD21DF| USA| 1GB| 101| MAC4|
| AXZCSD21DF| USA| 2GB| 101| MAC5|
| XZDCSD21DF| OTH| 2GB| 101| MAC1|
+-----------+--------+-----+----+-------+
scala> gsam.show()
+-----------+-------+
| CCKT_NO|SEV_LVL|
+-----------+-------+
| AXZCSD21DF| 1|
| BXZCSD21DF| 1|
| ABZCSD21DF| 3|
| CXZCSD21DF| 2|
| AXZCSD22DF| 2|
| XZDCSD21DF| 3|
|ADZZCSD21DF| 1|
+-----------+-------+
scala> val gsamjoin = gsam.join(input_file,(gsam("CCKT_NO") <=> input_file("ckt_id")));
gsamjoin: org.apache.spark.sql.DataFrame = [CCKT_NO: string, SEV_LVL: decimal(38,0), ckt_id: string, location: string, usage: string, port: string, machine: string]
scala> gsamjoin.show()
+-----------+-------+-----------+--------+-----+----+-------+
| CCKT_NO|SEV_LVL| ckt_id|location|usage|port|machine|
+-----------+-------+-----------+--------+-----+----+-------+
| CXZCSD21DF| 2| CXZCSD21DF| IND| 2GB| 101| MAC9|
| ABZCSD21DF| 3| ABZCSD21DF| OTH| 4GB| 101| MAC2|
| XZDCSD21DF| 3| XZDCSD21DF| OTH| 2GB| 101| MAC1|
| AXZCSD22DF| 2| AXZCSD22DF| IND| 9GB| 101| MAC3|
|ADZZCSD21DF| 1|ADZZCSD21DF| USA| 1GB| 101| MAC4|
| BXZCSD21DF| 1| BXZCSD21DF| USA| 7GB| 101| MAC6|
| AXZCSD21DF| 1| AXZCSD21DF| USA| 2GB| 101| MAC1|
| AXZCSD21DF| 1| AXZCSD21DF| USA| 6GB| 101| MAC4|
| AXZCSD21DF| 1| AXZCSD21DF| USA| 1GB| 101| MAC0|
| AXZCSD21DF| 1| AXZCSD21DF| USA| 2GB| 101| MAC5|
+-----------+-------+-----------+--------+-----+----+-------+
Created 02-10-2017 12:35 AM
Definitely possible! Here is some sample code:
gsam.join(input_file, (gsam("CCKT_NO")===input_file("ckt_id")) && (gsam("SEV_LVL") === 3)), "inner")
Notice the double && sign. You can put as many conditions as you'd like in.
Created 02-10-2017 12:35 AM
Definitely possible! Here is some sample code:
gsam.join(input_file, (gsam("CCKT_NO")===input_file("ckt_id")) && (gsam("SEV_LVL") === 3)), "inner")
Notice the double && sign. You can put as many conditions as you'd like in.
Created 02-10-2017 12:00 PM
Thank you Sir, But I think if we do join for a larger dataset memory issues will happen. So in such case can we use if/else or look up function here .
My Aim is to match input_file DFwith gsam DF and if CCKT_NO = ckt_id and SEV_LVL = 3 then print complete row for that ckt_id.