Iam trying to read the gzip files in a dir parallely. I followed the steps advised by Matei in the following link
but i get the below exception. looks like other people as well got the same exception. Just wanted to know if its possible to acheive this in spark 2.1.0. iam running on local VM at the moment using a simple two line code.
splitFilesPathList = List("s3n://pathtos3file1","file2,file3)etc...
val lineRDD = sc.parallelize(splitFilesPathList, 4).map(path => sc.textFile(path)).take(10).toList.foreach(println)
The above 2 lines code doesnt work. Any help is really appreciated please.
I have checked for closures and moved the code into a new scala class which extends serializable but i still get the
task not serializable exception
I tried almost all possible ways.
I checked this as well