Member since
04-07-2017
80
Posts
33
Kudos Received
0
Solutions
05-26-2016
03:05 PM
Hi Robert, Thanks for the details. Could you help me in reading a parquet file.
I have loaded some data in Hive and to validate the data I have run the TopNotch script(https://github.com/blackrock/TopNotch). This script has created the bad records in a fileName.gz.parquet file in HDFS under home directory. This script uses Sparksql. Now, I wanted to read/see these invalid records. I have tried the above script but it fails.
Could the above script be used to read data from parquet file. val newDataDF = sqlContext.read.parquet("/user/user1/topnotch/part-r-00000-1513f167-1c5a-4ca8-bb08-6b7cb70a64dc.gz.parquet") The above line throws error as not found.
I wanted these invalid records to be loaded in hive table for querying: parquetFile.registerTempTable("parquetFile")
val teenagers = sqlContext.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19") Thank you.
... View more