Archives of Support Questions (Read Only)

Stewart12586 · ‎09-06-2016

Hi,

I've this data in a textfile:

1	4
2	5
2	2
1	5

How can I using Spark and programming Scala can identify the rows that have the number repetead in same row? And how can I delete it? In this case I want to remove the third row...

Mnay thanks!

pminovic · ‎09-06-2016

scala> val a = sc.textFile("/user/.../path/to/your/file").map(x => x.split("\t")).filter(x => x(0) != x(1))
scala> a.take(4)
res2: Array[Array[String]] = Array(Array(1, 4), Array(2, 5), Array(1, 5))

Try the snippet above, just insert the path to your file on hdfs.

View solution in original post

pminovic · ‎09-06-2016

scala> val a = sc.textFile("/user/.../path/to/your/file").map(x => x.split("\t")).filter(x => x(0) != x(1))
scala> a.take(4)
res2: Array[Array[String]] = Array(Array(1, 4), Array(2, 5), Array(1, 5))

Try the snippet above, just insert the path to your file on hdfs.

Cloudera Community

Archives of Support Questions (Read Only)

Spark Scala - Remove rows that have columns with same value