Support Questions

Find answers, ask questions, and share your expertise

Flume - This role's process is starting. This role is supposed to be started.

avatar
Champion Alumni

Hello,

 

Suddently one of the 3 flume agents that are on the same machine is not starting anymore. All I have in logs is:

 

 

DEBUG	December 15 2015 10:16 AM	Shell	
Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
	at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:327)
	at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
	at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
	at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
	at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
	at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
	at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:269)
	at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:246)
	at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:323)
	at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:317)
	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:557)
	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413)
	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)





The health test result for FLUME_AGENT_SCM_HEALTH  has become concerning: This role's process exited while starting. A retry is in process.


The health test result for FLUME_AGENT_SCM_HEALTH  has become bad: This role's process is starting. This role is supposed to be started.

 

 

However, the files remain in .tmp (are never rolled anymore).. 

 

I cannot understant how come this agent has the error of the hadoop home dir and the other 2 don"t ...

 

 

Thank you!

 

Alina

 

 

GHERMAN Alina
1 ACCEPTED SOLUTION

avatar

That DEBUG message isn't indicating any problems. Can you look under /var/run/cloudera-scm-agent/process/*flume-AGENT/logs and see if there are any indications that flume is getting any OutOfMemory exception and being killed? What is your heap size set to?

View solution in original post

2 REPLIES 2

avatar

That DEBUG message isn't indicating any problems. Can you look under /var/run/cloudera-scm-agent/process/*flume-AGENT/logs and see if there are any indications that flume is getting any OutOfMemory exception and being killed? What is your heap size set to?

avatar
Champion Alumni

I didn't found any OutOfMemory error in the indicated logs (I did a grep). 

However, changing the heap helped. So it really was a heap problem 🙂

 

Thank you!

 

Alina 

GHERMAN Alina