We have a spark streaming job which reads data from kafka running on a 4 node cluster that uses a checkpoint directory on HDFS ....we had an I/O error where we ran out of the space and we had to go in and delete a few hdfs folders to free up some space and now we have bigger disks mounted ....and want to restart cleanly no need to preserve checkpoint data or kafka offset.....getting the error ..
Application application_1482342493553_0077 failed 2 times due to AM Container for appattempt_1482342493553_0077_000002 exited with exitCode: -1000 For more detailed output, check application tracking page:http://hdfs-name-node:8088/cluster/app/application_1482342493553_0077Then, click on links to logs of each attempt. Diagnostics: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1266542908-220.127.116.11-1479844615420:blk_1073795938_55173 file=/user/hadoopuser/streaming_2.10-1.0.0-SNAPSHOT.jar Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1484420770001 final status: FAILED tracking URL: http://hdfs-name-node:8088/cluster/app/application_1482342493553_0077 user: hadoopuser
From the error what i can make out is it's still looking for old hdfs blocks which we deleted ...
From research found that ..changing check point directory will help tried changing it and pointing to a new directory ...but still it's not helping to restart spark on clean slate ..it's still giving the same block exception ...Are we missing anything while doing the configuration changes? And how can we make sure that spark is started on a clean slate ?
Is "/user/hadoopuser/streaming_2.10-1.0.0-SNAPSHOT.jar" your application code jar? If yes then you may have to reupload it because its blocks are now missing from HDFS and so Spark cannot download it to run your application code.