Member since
03-31-2016
5
Posts
3
Kudos Received
0
Solutions
07-20-2017
12:29 AM
@Bala Vignesh N V the issue is first() method returns a string not a Rdd. Subtract will works within two rdd's. So u should convert tagsheader to rdd by using parallelize. tags = sc.textFile("hdfs:///data/spark/genome-tags.csv")
tagsheader = tags.first()
header = sc.parallelize([tagsheader])
tagsdata = tags.subtract(header)
... View more
04-26-2016
07:16 AM
2 Kudos
While loading file from hdfs to RDD how data splitting happend across partitons. is there anything like hadoop input split ?
... View more
Labels:
- Labels:
-
Apache Spark