Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark Scala - Remove rows that have columns with same value

avatar
Rising Star

Hi,

I've this data in a textfile:

14
25
22
15

How can I using Spark and programming Scala can identify the rows that have the number repetead in same row? And how can I delete it? In this case I want to remove the third row...

Mnay thanks!

1 ACCEPTED SOLUTION

avatar
Master Guru
scala> val a = sc.textFile("/user/.../path/to/your/file").map(x => x.split("\t")).filter(x => x(0) != x(1))
scala> a.take(4)
res2: Array[Array[String]] = Array(Array(1, 4), Array(2, 5), Array(1, 5))

Try the snippet above, just insert the path to your file on hdfs.

View solution in original post

1 REPLY 1

avatar
Master Guru
scala> val a = sc.textFile("/user/.../path/to/your/file").map(x => x.split("\t")).filter(x => x(0) != x(1))
scala> a.take(4)
res2: Array[Array[String]] = Array(Array(1, 4), Array(2, 5), Array(1, 5))

Try the snippet above, just insert the path to your file on hdfs.