Created 07-26-2016 08:38 AM
Hi,
I am using HDP-2.4.0.0-169 on Ubuntu 14.04 and I am experiencing daily 60G log files from hiveserver2. Before the error, logs files were 2M only.
The error that appears is:
2016-07-26 09:29:12,972 ERROR [HiveServer2-Handler-Pool: Thread-56]: exec.DDLTask (DDLTask.java:failed(525)) - org.apache.hadoop.hive.ql.metadata.HiveException: Exception while processing show databases at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2277) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:390) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1720) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1477) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1254) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1118) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1113) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:183) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:419) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:400) at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy20.executeStatement(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:261) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1317) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1302) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.FileNotFoundException: /tmp/hive/716ecb33-5f8b-4787-baee-7e369e56d006/hive_2016-07-26_09-29-12_951_5521987407554595007-2/-local-10000 (Too many open files) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:222) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209) at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:305) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:293) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:326) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:393) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:456) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776) at org.apache.hadoop.hive.ql.exec.DDLTask.showDatabases(DDLTask.java:2271) ... 35 more
I am troubleshooting but I cannot find the root cause yet. Can you please advice how to solve this issue?
Thank you.
Created 07-28-2016 02:20 AM
You need to increase your OS ulimit. Most likely you have some tables with multiple partitions and processes that access them. You will need to restart your servers and change the ulimit on all nodes. This requires downtime. It is a good practice to do it upfront estimating how the cluster will be used in regard to file descriptors.
Also section 1.2.8 here.
I cannot tell you what is the magic number for you, it depends on what you do and what the servers can provide as resources, but I have seen ulimit being set from tens of thousands to hundreds of thousands. The minimum requirement for installing Hortonworks Data Platform is 10,000. Try various numbers.
If this response helps, please vote/accept it as the best answer.
Created 07-28-2016 02:20 AM
You need to increase your OS ulimit. Most likely you have some tables with multiple partitions and processes that access them. You will need to restart your servers and change the ulimit on all nodes. This requires downtime. It is a good practice to do it upfront estimating how the cluster will be used in regard to file descriptors.
Also section 1.2.8 here.
I cannot tell you what is the magic number for you, it depends on what you do and what the servers can provide as resources, but I have seen ulimit being set from tens of thousands to hundreds of thousands. The minimum requirement for installing Hortonworks Data Platform is 10,000. Try various numbers.
If this response helps, please vote/accept it as the best answer.