Support Questions

Find answers, ask questions, and share your expertise

Increase open file limit of the user to scale for large data processing. ulimit and nofile


Increase open file limit of the user to scale for large data processing : hive, hbase, hdfs, oozie, yarn, mapred, Zookeeper, Spark, HCat



Here is the solution...

1. Services - Hive, HBase, HDFS, Oozie, YARN, MapReduce, Ambari Metrics

These Services we can directly change the file limit from Ambari UI.

Ambari UI > ServiceConfigs> <username of the service>_user_nofile_limit
Example: 1. Ambari UI -> HIVE -> Configs -> Advanced -> Advanced hive-env -> hive_user_nofile_limit  64000
         2. Ambari UI > Ambari Metrics > configs > Advanced ams-hbase-env > max_open_files_limit  64000
         3. Ambari UI > Yarn > configs > Advanced yarn-env > yarn_user_nofile_limit  64000
         4. Ambari UI > MAPREDUCE2 > configs > Advanced mapred-env > mapred_user_nofile_limit  64000

2. Services - Zookeeper, Spark, WebHCat, Ranger . Users - zookeeper, Spark, hcat, ranger

For users spark, hcat, zookeeper, ranger. Add the below lines for their respective nodes in /etc/security/limits.conf

/etc/security/limits.conf file should have below entries.

zookeeper  -    nofile    64000 
spark      -    nofile    64000
hcat       -    nofile    64000
ranger     -    nofile    64000

After save the changes. Login as spark/hcat/zookeeper user and execute ulimit -a command.

check the output. The output should contain value as open files (-n) 64000

Please find the below ulimit -a output .

[spark@node01]$ ulimit -a 
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 513179
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 64000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 64000
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

If you still see ulimit -a values not updated. Then please add the below lines to file /etc/pam.d/su .

vim /etc/pam.d/su
session         required

Repeat the above process... It will be successful.

View solution in original post



Here is the solution...

1. Services - Hive, HBase, HDFS, Oozie, YARN, MapReduce, Ambari Metrics

These Services we can directly change the file limit from Ambari UI.

Ambari UI > ServiceConfigs> <username of the service>_user_nofile_limit
Example: 1. Ambari UI -> HIVE -> Configs -> Advanced -> Advanced hive-env -> hive_user_nofile_limit  64000
         2. Ambari UI > Ambari Metrics > configs > Advanced ams-hbase-env > max_open_files_limit  64000
         3. Ambari UI > Yarn > configs > Advanced yarn-env > yarn_user_nofile_limit  64000
         4. Ambari UI > MAPREDUCE2 > configs > Advanced mapred-env > mapred_user_nofile_limit  64000

2. Services - Zookeeper, Spark, WebHCat, Ranger . Users - zookeeper, Spark, hcat, ranger

For users spark, hcat, zookeeper, ranger. Add the below lines for their respective nodes in /etc/security/limits.conf

/etc/security/limits.conf file should have below entries.

zookeeper  -    nofile    64000 
spark      -    nofile    64000
hcat       -    nofile    64000
ranger     -    nofile    64000

After save the changes. Login as spark/hcat/zookeeper user and execute ulimit -a command.

check the output. The output should contain value as open files (-n) 64000

Please find the below ulimit -a output .

[spark@node01]$ ulimit -a 
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 513179
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 64000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 64000
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

If you still see ulimit -a values not updated. Then please add the below lines to file /etc/pam.d/su .

vim /etc/pam.d/su
session         required

Repeat the above process... It will be successful.

Expert Contributor

Is there any sort of formula or how did you came up with this value for users's processes? is it a random value? what can I check within my cluster in order to get a proper value for me?


Hi @JLo_Hernandez I am having same question as of you. If you got the answer please let me know.

