Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Reading image files using spark scala

Highlighted

Reading image files using spark scala

New Contributor

I want to read a set of sample images from a folder, do some processing and then save files as images to another folder.What input file format method should I use for reading the images from the local folder ? I saw someone using:

val images = spark.wholeTextFiles("file://images-dir/")  

1 REPLY 1

Re: Reading image files using spark scala

Super Collaborator

Hi @Jitender Yadav,

You may read the data as byte stream to read and write the data files if you want to deal with binary files.

import org.apache.hadoop.fs.{FileSystem, Path}
val hadoopfs : FileSystem = FileSystem.get(sc.hadoopConfiguration)

//read the file as Input byte stream

val hadoopfsStreem = hadoopfs.open(new Path("<hdfs file Path>"))     


// to create the directory

if (!hadoopfs.exists(path)) {
      hadoopfs.mkdirs(path)
    }

// to write the data

hadoopfs.create("<>",<options>)

the Java API docs can be found here

please not that, this operation will not run in parallel, instead it runs of the driver so for dealing with larger volumes of data you definitely need to use the conventional spark Api's to read and write the data using RDD's/DataSets for better performance.

Hope this helps !!