Reply
New Contributor
Posts: 5
Registered: ‎01-28-2017

While realtime stops after certain time

Getting below error

 

The streaming job got killed again with too many open files exception.

 2017-03-07 11:14:12 WARN  TaskSetManager:70 - Lost task 7.0 in stage 31767.0 (TID 99289, node-4.perf.com): java.io.FileNotFoundException: /data/yarn/nm/usercache/tdubidata/appcache/application_1488862815845_0004/blockmgr-8c444a03-fe57-4eea-b4ef-1fd8e1ba6e7f/2d/shuffle_3311_7_0.index.670e98c9-aa58-4677-ab8f-9d17035f2ece (Too many open files in system)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:171)
    at org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFileAndCommit(IndexShuffleBlockResolver.scala:141)
    at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:128)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

 

Ulimit set to all users

 

[root@node-2 ]# ulimit -aH
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257534
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 257534
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

 

Also changes the Maximum Process File Descriptors of all parcels to 100000

 

Please help to find apt solution. 

Cloudera Employee
Posts: 97
Registered: ‎05-10-2016

Re: While realtime stops after certain time

The limits you displayed are shown for root, but you will need to increase limit for the user running the spark job.  You will also need to ensure these limits are applied on all workers and not just the driver.  Finally, you'll want to ensure you are closing all connections and files if you are handling any manually, especially for long running applications long spark streaming jobs.

New Contributor
Posts: 5
Registered: ‎01-28-2017

Re: While realtime stops after certain time

Thanks for the response

 

I have increased the ulimit to all user to 500000. Still after certain time, the streaming job went down.

 

How can we ensure , that we are closing all connections and files that we aer handling ?  Yes we are having  long running applications in  spark streaming jobs.

 

Please help.

 

 

Cloudera Employee
Posts: 97
Registered: ‎05-10-2016

Re: While realtime stops after certain time

I suggest checking too things. Login to a server that has a running executor and find the process id of that executor.  Then, first check that limits are indeed what you would expect

 

cat /proc/<pid>/limits

If they are what you would expect, check what open files exist which will help determine what could be left open

 

lsof -p <pid>

 

Limits are inherited from parent process, so you may also need to restart your NodeManager if you notice the executor process doesn't have the higher limits.

 

Highlighted
New Contributor
Posts: 5
Registered: ‎01-28-2017

Re: While realtime stops after certain time

parcels.JPG

These are the parcels, we are using. 

 

Login to a server that has a running executor and find the process id of that executor

 

I am not sure about the server that has a running executor, and how i can find the executor

 

 

 

 

 

[root@node-1 ec2-user]# ulimit -aH
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 257534
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 500000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 257534
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

 

This is 62 GB memory and 8 core machine

 

Announcements