About JoeR

JoeR · ‎04-29-2022

Hi From an API I receive a parquet as an Array[Byte] (stored here as parquetPayload), and want to convert that into a DataFrame. My current function is below, and includes a write to disk val tempFilePath = new Path("/tmp/", java.util.UUID.randomUUID().toString + ".tmp") val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration) val stream = fs.create(tempFilePath, true) try { stream.write(parquetPayload, 0, parquetPayload.length) } finally { stream.close() } val df = spark.read.parquet(tempFilePath.toString) This works, but I would like to avoid the write to disk, and keep this all in memory. Is this possible?

Online	Offline
Last Visited	‎04-29-2022 03:40 PM

Member Since	‎04-29-2022 01:36 AM
Last Visited	‎04-29-2022 03:40 PM
Posts	1

Cloudera Community

Parquet as Array[Byte] to DataFrame without writin...