Options
- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Solved
Go to solution
Spark Scala - Remove rows that have columns with same value
Labels:
- Labels:
-
Apache Spark
Rising Star
Created 09-06-2016 08:49 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I've this data in a textfile:
1 | 4 |
2 | 5 |
2 | 2 |
1 | 5 |
How can I using Spark and programming Scala can identify the rows that have the number repetead in same row? And how can I delete it? In this case I want to remove the third row...
Mnay thanks!
1 ACCEPTED SOLUTION
Master Guru
Created 09-06-2016 09:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
scala> val a = sc.textFile("/user/.../path/to/your/file").map(x => x.split("\t")).filter(x => x(0) != x(1)) scala> a.take(4) res2: Array[Array[String]] = Array(Array(1, 4), Array(2, 5), Array(1, 5))
Try the snippet above, just insert the path to your file on hdfs.
1 REPLY 1
Master Guru
Created 09-06-2016 09:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
scala> val a = sc.textFile("/user/.../path/to/your/file").map(x => x.split("\t")).filter(x => x(0) != x(1)) scala> a.take(4) res2: Array[Array[String]] = Array(Array(1, 4), Array(2, 5), Array(1, 5))
Try the snippet above, just insert the path to your file on hdfs.
