Support Questions

Find answers, ask questions, and share your expertise

Reading image files using spark scala

New Contributor

I want to read a set of sample images from a folder, do some processing and then save files as images to another folder.What input file format method should I use for reading the images from the local folder ? I saw someone using:

val images = spark.wholeTextFiles("file://images-dir/")  


Super Collaborator

Hi @Jitender Yadav,

You may read the data as byte stream to read and write the data files if you want to deal with binary files.

import org.apache.hadoop.fs.{FileSystem, Path}
val hadoopfs : FileSystem = FileSystem.get(sc.hadoopConfiguration)

//read the file as Input byte stream

val hadoopfsStreem = Path("<hdfs file Path>"))     

// to create the directory

if (!hadoopfs.exists(path)) {

// to write the data


the Java API docs can be found here

please not that, this operation will not run in parallel, instead it runs of the driver so for dealing with larger volumes of data you definitely need to use the conventional spark Api's to read and write the data using RDD's/DataSets for better performance.

Hope this helps !!