Support Questions
Find answers, ask questions, and share your expertise

Spark sequence File iteration in memory

Explorer

is there a way for spark to get the content of a sequence file in hdfs one by one.

my problem is that i have a collection of large sequnce file(>15 gb). this sequence file is created by merging small files.

i want to iterate and load and process these small files one by one to reduce memory consumption of loading the 15 gb in memory. example

 JavaPairRDD<String, Byte> file = jsc.sequenceFile("url", String.class, Byte.class);

//my wanted operation pseudocode
    file.foreach((string,byte)->{
process(string,byte);
commit&continue();
});
1 REPLY 1

Expert Contributor
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.