Archives of Support Questions (Read Only)

MKSmith · ‎02-10-2016

Hi,

Loaded a parquet file using Spark. I can read the file contents in Spark.

Created an external table on parquet file using the following syntax, altered table to add the partition,

select * from table returns null for all rows and columns.

CREATE EXTERNAL TABLE test_browser
(
fld1 string,
fld2 string,
FileName string,
LoadDate string,
Checksum string,
RecordId string
)
PARTITIONED BY (fname string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'hdfs://nameservice1/temp/dims/browser';

ALTER TABLE browser ADD PARTITION (fname='browser.parquet')
LOCATION 'hdfs://nameservice1/temp/dims/browser/browser.parquet';

Any pointers how to fix this, if you need additional info needed I'll add.

Thanks.

MKSmith · ‎02-11-2016

Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.

View solution in original post

MKSmith · ‎02-11-2016

Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.

cjervis · ‎02-11-2016

Great to see that you resolved the issue. Feel free to mark your last comment as the solution in case it can help others in the future. 🙂

Keep the questions coming,

Cy Jervis | Senior Manager, Knowledge Programs

if (helpful) { mark_as_solution(); } | if (appreciated) { give_kudos(); }

Cloudera Community

Archives of Support Questions (Read Only)

Cannot query parquet file generated by Spark

Cy Jervis | Senior Manager, Knowledge Programs