Reply
Highlighted
Expert Contributor
Posts: 162
Registered: ‎07-29-2013

Can strange pig parquet reader configuration cause DDoS for DataNodes?

Hi, I've met a strage case.

hdfs block size = 128MB

parquet.block.size = 512MB (I know it's useless because of hdfs block size = 128)

pig.maxCombinedSplitSize = 16 MB (I know, it's stupid)

 

I see a log

todayRaw = LOAD '$today' USING parquet.pig.ParquetLoader();

--some code goes here...

 

 

2014-09-15 15:06:26,316 [main] ERROR org.apache.pig.tools.grunt.Grunt  - ERROR 2245: Cannot get schema from loadFunc parquet.pig.ParquetLoader
2014-09-15 15:06:26,316 [main] ERROR org.apache.pig.tools.grunt.Grunt  - ERROR 2245: Cannot get schema from loadFunc parquet.pig.ParquetLoader
2014-09-15 15:06:26,319 [pool-4-thread-4] WARN  org.apache.hadoop.hdfs.DFSClient  - failed to connect to DomainSocket(fd=309,path=/var/run/hdfs-sockets/dn)
java.nio.channels.ClosedByInterruptException

 

2014-09-15 15:06:26,321 [pool-4-thread-2] WARN  org.apache.hadoop.hdfs.DFSClient  - Failed to connect to /10.66.49.157:50010 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException
java.nio.channels.ClosedByInterruptException

 

Caused by: java.io.IOException: Could not read footer: java.lang.RuntimeException: hdfs://nameservice1/my/strage/dataset/report_source/zones/2014/04/26/part-r-00000 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [9, 45, 49, 10]
    at parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:160)
    at parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:123)
    at parquet.hadoop.ParquetInputFormat.getFooters(ParquetInputFormat.java:354)
    at parquet.hadoop.ParquetInputFormat.getFooters(ParquetInputFormat.java:339)
    at parquet.hadoop.ParquetInputFormat.getGlobalMetaData(ParquetInputFormat.java:363)
    at parquet.pig.ParquetLoader.initSchema(ParquetLoader.java:203)
    at parquet.pig.ParquetLoader.setInput(ParquetLoader.java:106)
    at parquet.pig.ParquetLoader.getSchema(ParquetLoader.java:187)
    at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174)
    ... 35 more
Caused by: java.lang.RuntimeException: hdfs://nameservice1/my/strage/dataset/report_source/zones/2014/04/26/part-r-00000 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [9, 45, 49, 10]
    at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:263)
    at parquet.hadoop.ParquetFileReader$1.readFooter(ParquetFileReader.java:147)
    at parquet.hadoop.ParquetFileReader$1.call(ParquetFileReader.java:139)
    at parquet.hadoop.ParquetFileReader$1.call(ParquetFileReader.java:134)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)

 

Announcements
New solutions