Cannot query parquet file generated by Spark

MKSmith — Fri, 16 Sep 2022 10:03:32 GMT

Hi,

Loaded a parquet file using Spark. I can read the file contents in Spark.

Created an external table on parquet file using the following syntax, altered table to add the partition,

select * from table returns null for all rows and columns.

CREATE EXTERNAL TABLE test_browser
(
fld1 string,
fld2 string,
FileName string,
LoadDate string,
Checksum string,
RecordId string
)
PARTITIONED BY (fname string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/temp/dims/browser';

ALTER TABLE browser ADD PARTITION (fname='browser.parquet')
LOCATION 'hdfs://nameservice1/temp/dims/browser/browser.parquet';

Any pointers how to fix this, if you need additional info needed I'll add.

Thanks.

Re: Cannot query parquet file generated by Spark

MKSmith — Thu, 11 Feb 2016 16:54:37 GMT

Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.

Re: Cannot query parquet file generated by Spark

cjervis — Thu, 11 Feb 2016 17:12:16 GMT

Great to see that you resolved the issue. Feel free to mark your last comment as the solution in case it can help others in the future. 🙂

question Cannot query parquet file generated by Spark in Archives of Support Questions (Read Only)

Cannot query parquet file generated by Spark

Re: Cannot query parquet file generated by Spark

Re: Cannot query parquet file generated by Spark