Member since
02-10-2016
2
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3076 | 02-11-2016 08:54 AM |
02-11-2016
08:54 AM
Found the problem. The hive schema I was using different from parquet file content. Recreated Hive table with correct columns, fixed.
... View more
02-10-2016
12:47 PM
Hi, Loaded a parquet file using Spark. I can read the file contents in Spark. Created an external table on parquet file using the following syntax, altered table to add the partition, select * from table returns null for all rows and columns. CREATE EXTERNAL TABLE test_browser ( fld1 string, fld2 string, FileName string, LoadDate string, Checksum string, RecordId string ) PARTITIONED BY (fname string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://nameservice1/temp/dims/browser'; ALTER TABLE browser ADD PARTITION (fname='browser.parquet') LOCATION 'hdfs://nameservice1/temp/dims/browser/browser.parquet'; Any pointers how to fix this, if you need additional info needed I'll add. Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark
-
HDFS