Reply
Contributor
Posts: 39
Registered: ‎09-20-2014

Cannot read Parquet file generated by Spark SQL in HUE

Hello!

 

I am not able to read Parquet files which were generated using Java API of Spark SQL in HUE. I am using CDH 5.5.1. All I am getting is "Failed to read Parquet file." It is the same when it is uncompressed or zipped.

 

If I am using MapReduce Parquet Java libraries and not Spark SQL, I am able to read it. Will the Parquet format from Spark SQL be also supported in HUE?

 

Thanks!

Highlighted
Contributor
Posts: 39
Registered: ‎09-20-2014

Re: Cannot read Parquet file generated by Spark SQL in HUE

That error is probably created in function below in this file: ./parcels/CDH/lib/hue/apps/filebrowser/src/filebrowser/views.py

 

def _read_parquet(fhandle, path, offset, length, stats):
   try:
      dumped_data = StringIO()
      parquet._dump(fhandle, ParquetOptions(), out=dumped_data)
      dumped_data.seek(offset)
      return dumped_data.read()
   except:
      logging.exception("Could not read parquet file at %s" % path)
      raise PopupException(_("Failed to read Parquet file."))