question Spark Scala - Remove rows that have columns with same value in Archives of Support Questions (Read Only)

Spark Scala - Remove rows that have columns with same value

Stewart12586 — Tue, 06 Sep 2016 15:49:41 GMT

Hi,

I've this data in a textfile:

1	4
2	5
2	2
1	5

How can I using Spark and programming Scala can identify the rows that have the number repetead in same row? And how can I delete it? In this case I want to remove the third row...

Mnay thanks!

Re: Spark Scala - Remove rows that have columns with same value

pminovic — Tue, 06 Sep 2016 16:46:50 GMT

scala> val a = sc.textFile("/user/.../path/to/your/file").map(x => x.split("\t")).filter(x => x(0) != x(1))
scala> a.take(4)
res2: Array[Array[String]] = Array(Array(1, 4), Array(2, 5), Array(1, 5))

Try the snippet above, just insert the path to your file on hdfs.