Created 07-27-2018 08:32 AM
Increase open file limit of the user to scale for large data processing : hive, hbase, hdfs, oozie, yarn, mapred, Zookeeper, Spark, HCat
Created 07-27-2018 08:45 AM
Here is the solution...
1. Services - Hive, HBase, HDFS, Oozie, YARN, MapReduce, Ambari Metrics
These Services we can directly change the file limit from Ambari UI.
Ambari UI > ServiceConfigs> <username of the service>_user_nofile_limit Example: 1. Ambari UI -> HIVE -> Configs -> Advanced -> Advanced hive-env -> hive_user_nofile_limit 64000 2. Ambari UI > Ambari Metrics > configs > Advanced ams-hbase-env > max_open_files_limit 64000 3. Ambari UI > Yarn > configs > Advanced yarn-env > yarn_user_nofile_limit 64000 4. Ambari UI > MAPREDUCE2 > configs > Advanced mapred-env > mapred_user_nofile_limit 64000
2. Services - Zookeeper, Spark, WebHCat, Ranger . Users - zookeeper, Spark, hcat, ranger
For users spark, hcat, zookeeper, ranger. Add the below lines for their respective nodes in /etc/security/limits.conf
/etc/security/limits.conf file should have below entries.
zookeeper - nofile 64000 spark - nofile 64000 hcat - nofile 64000 ranger - nofile 64000
After save the changes. Login as spark/hcat/zookeeper user and execute ulimit -a command.
check the output. The output should contain value as open files (-n) 64000
Please find the below ulimit -a output .
[spark@node01]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 513179 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 64000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 64000 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
If you still see ulimit -a values not updated. Then please add the below lines to file /etc/pam.d/su .
vim /etc/pam.d/su session required pam_limits.so
Repeat the above process... It will be successful.
Created 07-27-2018 08:45 AM
Here is the solution...
1. Services - Hive, HBase, HDFS, Oozie, YARN, MapReduce, Ambari Metrics
These Services we can directly change the file limit from Ambari UI.
Ambari UI > ServiceConfigs> <username of the service>_user_nofile_limit Example: 1. Ambari UI -> HIVE -> Configs -> Advanced -> Advanced hive-env -> hive_user_nofile_limit 64000 2. Ambari UI > Ambari Metrics > configs > Advanced ams-hbase-env > max_open_files_limit 64000 3. Ambari UI > Yarn > configs > Advanced yarn-env > yarn_user_nofile_limit 64000 4. Ambari UI > MAPREDUCE2 > configs > Advanced mapred-env > mapred_user_nofile_limit 64000
2. Services - Zookeeper, Spark, WebHCat, Ranger . Users - zookeeper, Spark, hcat, ranger
For users spark, hcat, zookeeper, ranger. Add the below lines for their respective nodes in /etc/security/limits.conf
/etc/security/limits.conf file should have below entries.
zookeeper - nofile 64000 spark - nofile 64000 hcat - nofile 64000 ranger - nofile 64000
After save the changes. Login as spark/hcat/zookeeper user and execute ulimit -a command.
check the output. The output should contain value as open files (-n) 64000
Please find the below ulimit -a output .
[spark@node01]$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 513179 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 64000 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 64000 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
If you still see ulimit -a values not updated. Then please add the below lines to file /etc/pam.d/su .
vim /etc/pam.d/su session required pam_limits.so
Repeat the above process... It will be successful.
Created 08-07-2019 03:54 PM
Is there any sort of formula or how did you came up with this value for users's processes? is it a random value? what can I check within my cluster in order to get a proper value for me?
Created 03-25-2022 06:20 AM
Hi @JLo_Hernandez I am having same question as of you. If you got the answer please let me know.
Thanks