Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Parquet as Array[Byte] to DataFrame without writing to disk

avatar
New Contributor

Hi

From an API I receive a parquet as an Array[Byte] (stored here as parquetPayload), and want to convert that into a DataFrame. My current function is below, and includes a write to disk

 

 

val tempFilePath = new Path("/tmp/", java.util.UUID.randomUUID().toString + ".tmp")
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

val stream = fs.create(tempFilePath, true)
try {
  stream.write(parquetPayload, 0, parquetPayload.length)
} finally {
  stream.close()
}

val df = spark.read.parquet(tempFilePath.toString) 

 

 

This works, but I would like to avoid the write to disk, and keep this all in memory. Is this possible? 

1 REPLY 1

avatar
Super Collaborator

Hi @JoeR 

 

Spark will support reading files with multiple file formats like parquet, orc, json, xml, avro,csv etc. I think there is no direct mechanism to read the data from the payload. 

 

If I found a different solution, I will share it with you.