Support Questions

MKSmith · ‎02-10-2016

Hi,

Loaded a parquet file using Spark. I can read the file contents in Spark.

Created an external table on parquet file using the following syntax, altered table to add the partition,

select * from table returns null for all rows and columns.

CREATE EXTERNAL TABLE test_browser
(
fld1 string,
fld2 string,
FileName string,
LoadDate string,
Checksum string,
RecordId string
)
PARTITIONED BY (fname string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/temp/dims/browser';

ALTER TABLE browser ADD PARTITION (fname='browser.parquet')
LOCATION 'hdfs://nameservice1/temp/dims/browser/browser.parquet';

Any pointers how to fix this, if you need additional info needed I'll add.

Thanks.

MKSmith · ‎02-11-2016

Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.

View solution in original post

MKSmith · ‎02-11-2016

Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.

cjervis · ‎02-11-2016

Great to see that you resolved the issue. Feel free to mark your last comment as the solution in case it can help others in the future. 🙂

Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Cloudera Community

Support Questions

Cannot query parquet file generated by Spark