Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Cannot query parquet file generated by Spark

avatar
Frequent Visitor

Hi,

 

Loaded a parquet file using Spark. I can read the file contents in Spark.

Created an external table on parquet file using the following syntax, altered table to add the partition,

select * from table returns null for all rows and columns.

 

CREATE EXTERNAL TABLE test_browser
(
fld1 string,
fld2 string,
FileName string,
LoadDate string,
Checksum string,
RecordId string
)
PARTITIONED BY (fname string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/temp/dims/browser';

 

ALTER TABLE browser ADD PARTITION (fname='browser.parquet')
LOCATION 'hdfs://nameservice1/temp/dims/browser/browser.parquet';

 

Any pointers how to fix this, if you need additional info needed I'll add.

Thanks.

1 ACCEPTED SOLUTION

avatar
Frequent Visitor

Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.

View solution in original post

2 REPLIES 2

avatar
Frequent Visitor

Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.

avatar
Community Manager

Great to see that you resolved the issue. Feel free to mark your last comment as the solution in case it can help others in the future. 🙂


Keep the questions coming,

Cy Jervis | Senior Manager, Knowledge Programs

if (helpful) { mark_as_solution(); } | if (appreciated) { give_kudos(); }