Created on 12-15-2015 02:34 AM - edited 09-16-2022 02:53 AM
Hello,
Suddently one of the 3 flume agents that are on the same machine is not starting anymore. All I have in logs is:
DEBUG December 15 2015 10:16 AM Shell Failed to detect a valid hadoop home directory java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:327) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104) at org.apache.hadoop.security.Groups.<init>(Groups.java:86) at org.apache.hadoop.security.Groups.<init>(Groups.java:66) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:269) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:246) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:323) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:317) at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:557) at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272) at org.apache.flume.conf.Configurables.configure(Configurables.java:41) at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) The health test result for FLUME_AGENT_SCM_HEALTH has become concerning: This role's process exited while starting. A retry is in process. The health test result for FLUME_AGENT_SCM_HEALTH has become bad: This role's process is starting. This role is supposed to be started.
However, the files remain in .tmp (are never rolled anymore)..
I cannot understant how come this agent has the error of the hadoop home dir and the other 2 don"t ...
Thank you!
Alina
Created on 12-15-2015 08:39 AM - edited 12-15-2015 08:39 AM
That DEBUG message isn't indicating any problems. Can you look under /var/run/cloudera-scm-agent/process/*flume-AGENT/logs and see if there are any indications that flume is getting any OutOfMemory exception and being killed? What is your heap size set to?
Created on 12-15-2015 08:39 AM - edited 12-15-2015 08:39 AM
That DEBUG message isn't indicating any problems. Can you look under /var/run/cloudera-scm-agent/process/*flume-AGENT/logs and see if there are any indications that flume is getting any OutOfMemory exception and being killed? What is your heap size set to?
Created 12-16-2015 01:12 AM
I didn't found any OutOfMemory error in the indicated logs (I did a grep).
However, changing the heap helped. So it really was a heap problem 🙂
Thank you!
Alina