Member since
04-29-2022
1
Post
0
Kudos Received
0
Solutions
04-29-2022
01:47 AM
Hi From an API I receive a parquet as an Array[Byte] (stored here as parquetPayload), and want to convert that into a DataFrame. My current function is below, and includes a write to disk val tempFilePath = new Path("/tmp/", java.util.UUID.randomUUID().toString + ".tmp")
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val stream = fs.create(tempFilePath, true)
try {
stream.write(parquetPayload, 0, parquetPayload.length)
} finally {
stream.close()
}
val df = spark.read.parquet(tempFilePath.toString) This works, but I would like to avoid the write to disk, and keep this all in memory. Is this possible?
... View more
Labels:
- Labels:
-
Apache Spark