Support Questions

Find answers, ask questions, and share your expertise

File too large Exception

avatar
Contributor

Hi

I am trying to process avro record using mapreduce where the key of the map is an avro record

public void map(AvroKey<GenericData.Record> key, NullWritable value, Context context)

The job fails if the number of columns to be processed in each record goes beyond a particular value.Say for example if the number of fields in each row is more than 100, my job fails.I tried to increase the map memory and java heap space in the cluster, but it didn't help.

9447-error.png

Thanks in advance

Aparna

1 ACCEPTED SOLUTION

avatar
Contributor

Hi

I was able to resolve the issue,the disk utilization in local directory (where logs and out files are created) in one of the node was more than the yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage setting. I freed up some space and also set the max-disk-utilization-percentage to much higher value.

Thanks

Aparna

View solution in original post

4 REPLIES 4

avatar
Super Collaborator

Hi Aparna,

Please go through this URL hope it will help you.

http://stackoverflow.com/questions/25242287/filenotfoundexception-file-too-large

avatar
Master Guru

Have you tried this in Spark? or NiFi?

How much memory is configured in your app?

How much is configured in YARN for your job resources?

Can you post additional logs? code? submit details?

Why is the key an avro record and not the value?

You should make sure you have enough space in HDFS and also in the regular file system as some of the reduce stage will get mapped to regular disk.

Can you post hdfs and regular file system df

avatar
Contributor

Hi

Please see my inline comments.

Have you tried this in Spark? or NiFi?

No

How much is configured in YARN for your job resources?

Memory allocated for yarn containers in each node - 200GB

Can you post additional logs? code? submit details?

I did not get any extra info other than FSError

Why is the key an avro record and not the value?

I am using AvroKeyInputFormat

You should make sure you have enough space in HDFS and also in the regular file system as some of the reduce stage will get mapped to regular disk.

I have enough space left in HDFS more precisely

HDFS -only 3% is being used and

Local FS -only 15% is being used

Ulimit

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 1032250

max locked memory (kbytes, -l) 64

max memory size (kbytes, -m) unlimited

open files (-n) 1024

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) 10240

cpu time (seconds, -t) unlimited

max user processes (-u) 1024

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

Thanks

avatar
Contributor

Hi

I was able to resolve the issue,the disk utilization in local directory (where logs and out files are created) in one of the node was more than the yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage setting. I freed up some space and also set the max-disk-utilization-percentage to much higher value.

Thanks

Aparna