Member since
12-29-2016
4
Posts
3
Kudos Received
0
Solutions
01-01-2017
04:29 AM
Hi, Tried reading the whole directory also. No luck! I don't want to delete or move or identify such files. I just want to skip/ignore such files while reading using sparksql. Thanks
... View more
12-30-2016
02:01 AM
1 Kudo
@Timothy Spann Hi Timothy , Thanks for the quick response. So, parquet file's footer is corrupted. I am reading multiple files from one directory using sparksql. In that dir one file's footer is corrupted and so spark crashes. Is there any way to just ignore that corrupted blocks and read other files as it is? I switched off the filter-pushdown by using sqlContext.setConf("spark.sql.parquet.filterPushdown","false") Code used to read multiple files. (Here, /data/tempparquetdata/br.1455148800.0 is corrupted ) val newDataDF = sqlContext.read.parquet("/data/tempparquetdata/data1.parquet","/data/tempparquetdata/data2.parquet","/data/tempparquetdata/br.1455148800.0") newDataDF.show throws the Exception " java.lang.RuntimeException: hdfs://CRUX2-SETUP:9000/data/tempparquetdata/br.1455148800.0 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [82, 52, 24, 10]"
... View more
12-29-2016
07:56 AM
2 Kudos
How to skip the corrupted block in parquet without getting exception. Also, how to ignore corrupted footer without getting Spark crashed.
... View more
Labels:
- Labels:
-
Apache Spark