Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Flume - This role's process is starting. This role is supposed to be started.

avatar
Champion Alumni

Hello,

 

Suddently one of the 3 flume agents that are on the same machine is not starting anymore. All I have in logs is:

 

 

DEBUG	December 15 2015 10:16 AM	Shell	
Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.
	at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:327)
	at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
	at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
	at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
	at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
	at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
	at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:269)
	at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:246)
	at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:323)
	at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:317)
	at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:557)
	at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
	at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413)
	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)





The health test result for FLUME_AGENT_SCM_HEALTH  has become concerning: This role's process exited while starting. A retry is in process.


The health test result for FLUME_AGENT_SCM_HEALTH  has become bad: This role's process is starting. This role is supposed to be started.

 

 

However, the files remain in .tmp (are never rolled anymore).. 

 

I cannot understant how come this agent has the error of the hadoop home dir and the other 2 don"t ...

 

 

Thank you!

 

Alina

 

 

GHERMAN Alina
1 ACCEPTED SOLUTION

avatar

That DEBUG message isn't indicating any problems. Can you look under /var/run/cloudera-scm-agent/process/*flume-AGENT/logs and see if there are any indications that flume is getting any OutOfMemory exception and being killed? What is your heap size set to?

View solution in original post

2 REPLIES 2

avatar

That DEBUG message isn't indicating any problems. Can you look under /var/run/cloudera-scm-agent/process/*flume-AGENT/logs and see if there are any indications that flume is getting any OutOfMemory exception and being killed? What is your heap size set to?

avatar
Champion Alumni

I didn't found any OutOfMemory error in the indicated logs (I did a grep). 

However, changing the heap helped. So it really was a heap problem 🙂

 

Thank you!

 

Alina 

GHERMAN Alina