Created on 02-10-2016 12:47 PM - edited 09-16-2022 03:03 AM
Hi,
Loaded a parquet file using Spark. I can read the file contents in Spark.
Created an external table on parquet file using the following syntax, altered table to add the partition,
select * from table returns null for all rows and columns.
CREATE EXTERNAL TABLE test_browser
(
fld1 string,
fld2 string,
FileName string,
LoadDate string,
Checksum string,
RecordId string
)
PARTITIONED BY (fname string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/temp/dims/browser';
ALTER TABLE browser ADD PARTITION (fname='browser.parquet')
LOCATION 'hdfs://nameservice1/temp/dims/browser/browser.parquet';
Any pointers how to fix this, if you need additional info needed I'll add.
Thanks.
Created 02-11-2016 08:54 AM
Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.
Created 02-11-2016 08:54 AM
Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.
Created 02-11-2016 09:12 AM
Great to see that you resolved the issue. Feel free to mark your last comment as the solution in case it can help others in the future. 🙂