is there a way for spark to get the content of a sequence file in hdfs one by one.
my problem is that i have a collection of large sequnce file(>15 gb). this sequence file is created by merging small files.
i want to iterate and load and process these small files one by one to reduce memory consumption of loading the 15 gb in memory. example
JavaPairRDD<String, Byte> file = jsc.sequenceFile("url", String.class, Byte.class);
//my wanted operation pseudocode
There is a working Scala version tested and used by me.