About MKSmith

MKSmith · ‎02-11-2016

Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.

MKSmith · ‎02-10-2016

Hi, Loaded a parquet file using Spark. I can read the file contents in Spark. Created an external table on parquet file using the following syntax, altered table to add the partition, select * from table returns null for all rows and columns. CREATE EXTERNAL TABLE test_browser ( fld1 string, fld2 string, FileName string, LoadDate string, Checksum string, RecordId string ) PARTITIONED BY (fname string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://nameservice1/temp/dims/browser'; ALTER TABLE browser ADD PARTITION (fname='browser.parquet') LOCATION 'hdfs://nameservice1/temp/dims/browser/browser.parquet'; Any pointers how to fix this, if you need additional info needed I'll add. Thanks.

Online	Offline
Last Visited	‎02-11-2016 12:52 PM

Member Since	‎02-10-2016 12:52 PM
Last Visited	‎02-11-2016 12:52 PM
Posts	2

Cloudera Community

Re: Cannot query parquet file generated by Spark

Re: Cannot query parquet file generated by Spark

Cannot query parquet file generated by Spark