Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Cannot read Parquet file generated by Spark SQL in HUE

Highlighted

Cannot read Parquet file generated by Spark SQL in HUE

Contributor

Hello!

 

I am not able to read Parquet files which were generated using Java API of Spark SQL in HUE. I am using CDH 5.5.1. All I am getting is "Failed to read Parquet file." It is the same when it is uncompressed or zipped.

 

If I am using MapReduce Parquet Java libraries and not Spark SQL, I am able to read it. Will the Parquet format from Spark SQL be also supported in HUE?

 

Thanks!

1 REPLY 1

Re: Cannot read Parquet file generated by Spark SQL in HUE

Contributor

That error is probably created in function below in this file: ./parcels/CDH/lib/hue/apps/filebrowser/src/filebrowser/views.py

 

def _read_parquet(fhandle, path, offset, length, stats):
   try:
      dumped_data = StringIO()
      parquet._dump(fhandle, ParquetOptions(), out=dumped_data)
      dumped_data.seek(offset)
      return dumped_data.read()
   except:
      logging.exception("Could not read parquet file at %s" % path)
      raise PopupException(_("Failed to read Parquet file."))