Support Questions

Find answers, ask questions, and share your expertise

Flume agent on windows

avatar
Expert Contributor

Hi to all,

I need to install the flume-agent (1.5) on a Windows environment, to collect the logs and bring them on a hdp cluster on azure.

Can I only configure the agent or need complete installation of flume?

There is a complete guide with all steps of all installation / configuration?

I searched on web but could not find a complete guide.

Thank you

1 ACCEPTED SOLUTION

avatar
Super Collaborator

I can propose much easier steps:

1. Download flume binaries - http://flume.apache.org/download.html and extract it somewhere (this is going to be a FLUME_HOME)

2. Download winutils and put it somwhere (f.e. C:/winutils/bin, in this case C:/winutils is going to be a HADOOP_HOME)

3. Copy all missed hdfs libs to your FLUME_HOME/lib (you can find them in your hadoop cluster, is always preferable to have exact the same versions as in /usr/hdp/current/hadoop or /usr/hdp/current/hadoop-hdfs)

4. Run flume agent with the following command:

bin\flume-ng agent -name MyAgent -f conf/MyAgent.properties -property "flume.root.logger=INFO,LOGFILE,console;flume.log.file=MyLog.log;hadoop.home.dir=C:/winutils"

View solution in original post

14 REPLIES 14

avatar
Super Collaborator

I've never tried that scenario, but it should be possible.

All you need is to install flume on windows machine (just extract zip file) and add jars needed to connect to azure (if any).

You can use hdfs.kerberosPrincipal, hdfs.kerberosKeytab properties if you have secure hdfs

Regards

avatar
Expert Contributor

Hello everyone,

I am writing to report the steps I followed.

For test I'm using Windows 10 as a client machine, which runs on the sandbox with VirtualBox 2.4.

I added the hosts file: 127.0.0.1 sandbox.hortonworks.com

I was inspired by these guide because with binary Apache have more error and other difficulties: http://mapredit.blogspot.it/2012/07/run-flume-13x-on-windows.html

- Installed jdk and maven

- Set the environment variable

- Compiled with maven

- Unpacking the tar in c:\flume

- Created the flume configuration file

syslog-agent.sources = Syslog 
syslog-agent.channels = MemoryChannel-1 
syslog-agent.sinks = Console
syslog-agent.sources.Syslog.type = syslogTcp 
syslog-agent.sources.Syslog.port = 5140
syslog-agent.sources.Syslog.channels = MemoryChannel-1 
syslog-agent.sinks.Console.channel = MemoryChannel-1 
syslog-agent.sinks.Console.type = logger
syslog-agent.channels.MemoryChannel-1.type = memory

- Launched the client:

java -Xmx20m -Dlog4j.configuration=file:///%CD%\conf\log4j.properties -cp "c:\flume\lib\*" org.apache.flume.node.Application -f c:\flume\conf\syslog-agent.conf -n syslog-agent

- Try to send syslog message, and have this response:

2016-03-15 17:50:42,215 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{host=host, Severity=7, Facility=1, priority=15, timestamp=1458042030000} body: 66 6F 6F 5B 33 34 35 5D 3A 20 61 20 73 79 73 6C foo[345]: a sysl }

Works (yeah), then I try write log on HDFS (my final goal).

Change conf file in:

syslog-agent.sources = Syslog
syslog-agent.channels = MemoryChannel-1
syslog-agent.sinks = HDFS-LAB
syslog-agent.sources.Syslog.type = syslogTcp
syslog-agent.sources.Syslog.port = 5140
syslog-agent.sources.Syslog.channels = MemoryChannel-1
syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1
syslog-agent.sinks.HDFS-LAB.type = hdfs
syslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://sandbox.hortonworks.com:8020/apps/flume/%y-%m-%d/%H%M/%S
syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfiles-
syslog-agent.sinks.HDFS-LAB.hdfs.round = true
syslog-agent.sinks.HDFS-LAB.hdfs.roundValue = 10
syslog-agent.sinks.HDFS-LAB.hdfs.roundUnit = second
syslog-agent.channels.MemoryChannel-1.type = memory

Start agen, ERROR:

2016-03-15 18:05:50,322 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)] Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType
        at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:239)
        at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
        at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413)
        at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
        at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 12 more

The list of jar on c:\flume\lib:

15/03/2016  12:04           346.729 apache-log4j-extras-1.1.jar
15/03/2016  12:04            18.031 async-1.4.0.jar
15/03/2016  12:04         1.138.911 asynchbase-1.5.0.jar
15/03/2016  11:06           303.139 avro-1.7.4.jar
15/03/2016  11:06           187.840 avro-ipc-1.7.4.jar
15/03/2016  12:02            41.123 commons-cli-1.2.jar
15/03/2016  11:06           263.865 commons-codec-1.8.jar
15/03/2016  11:06           588.337 commons-collections-3.2.2.jar
15/03/2016  11:06           241.367 commons-compress-1.4.1.jar
15/03/2016  12:03           160.519 commons-dbcp-1.4.jar
15/03/2016  12:02           163.151 commons-io-2.1.jar
15/03/2016  12:10           267.634 commons-jexl-2.1.1.jar
15/03/2016  11:06           279.193 commons-lang-2.5.jar
15/03/2016  11:06            60.686 commons-logging-1.1.1.jar
15/03/2016  12:03            96.221 commons-pool-1.5.4.jar
15/03/2016  12:04            68.866 curator-client-2.6.0.jar
15/03/2016  12:04           185.245 curator-framework-2.6.0.jar
15/03/2016  12:04           248.171 curator-recipes-2.6.0.jar
15/03/2016  12:03         3.103.132 derby-10.11.1.1.jar
15/03/2016  12:12            20.623 flume-avro-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:11            38.276 flume-dataset-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:03           295.717 flume-file-channel-1.7.0-SNAPSHOT.jar
15/03/2016  12:03            67.897 flume-hdfs-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            43.114 flume-hive-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:03            13.672 flume-irc-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:03            53.603 flume-jdbc-channel-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            27.199 flume-jms-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            21.662 flume-kafka-channel-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            15.948 flume-kafka-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:02            26.369 flume-ng-auth-1.7.0-SNAPSHOT.jar
15/03/2016  12:01            56.785 flume-ng-configuration-1.7.0-SNAPSHOT.jar
15/03/2016  12:02           381.642 flume-ng-core-1.7.0-SNAPSHOT.jar
15/03/2016  12:05            37.966 flume-ng-elasticsearch-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:04            20.620 flume-ng-embedded-agent-1.7.0-SNAPSHOT.jar
15/03/2016  12:04            53.204 flume-ng-hbase-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:09            15.267 flume-ng-kafka-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            16.668 flume-ng-log4jappender-1.7.0-SNAPSHOT.jar
15/03/2016  12:08            35.263 flume-ng-morphline-solr-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:04            37.514 flume-ng-node-1.7.0-SNAPSHOT.jar
15/03/2016  12:01           120.730 flume-ng-sdk-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            44.551 flume-scribe-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:03            22.533 flume-spillable-memory-channel-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            32.807 flume-taildir-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            56.552 flume-thrift-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:13            18.917 flume-tools-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            14.944 flume-twitter-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:02           189.285 gson-2.2.2.jar
15/03/2016  12:01         1.648.200 guava-11.0.2.jar
15/03/2016  11:06           427.021 httpclient-4.2.1.jar
15/03/2016  11:06           181.201 httpcore-4.1.3.jar
15/03/2016  12:03           132.202 irclib-1.10.jar
15/03/2016  12:07            35.058 jackson-annotations-2.3.0.jar
15/03/2016  12:07           197.986 jackson-core-2.3.1.jar
15/03/2016  11:06           228.268 jackson-core-asl-1.9.3.jar
15/03/2016  12:07           914.311 jackson-databind-2.3.1.jar
15/03/2016  11:06           773.019 jackson-mapper-asl-1.9.3.jar
15/03/2016  11:06           539.912 jetty-6.1.26.jar
15/03/2016  11:06           177.131 jetty-util-6.1.26.jar
15/03/2016  12:02           570.478 joda-time-2.1.jar
15/03/2016  12:08            53.244 jopt-simple-3.2.jar
15/03/2016  12:01            33.015 jsr305-1.3.9.jar
15/03/2016  12:09         3.514.920 kafka_2.10-0.8.1.1.jar
15/03/2016  12:10         2.141.463 kite-data-core-1.0.0.jar
15/03/2016  12:10         2.020.522 kite-data-hbase-1.0.0.jar
15/03/2016  12:10         1.799.126 kite-data-hive-1.0.0.jar
15/03/2016  12:07         1.764.982 kite-hadoop-compatibility-1.0.0.jar
15/03/2016  11:06           347.531 libthrift-0.9.0.jar
15/03/2016  11:06           489.884 log4j-1.2.17.jar
15/03/2016  12:03           390.675 mapdb-0.9.9.jar
15/03/2016  12:04            82.123 metrics-core-2.2.0.jar
15/03/2016  12:02           644.934 mina-core-2.0.4.jar
15/03/2016  11:06         1.132.988 netty-3.5.12.Final.jar
15/03/2016  12:10            19.827 opencsv-2.3.jar
15/03/2016  11:06            29.555 paranamer-2.3.jar
15/03/2016  12:07            41.943 parquet-avro-1.4.1.jar
15/03/2016  12:07           724.377 parquet-column-1.4.1.jar
15/03/2016  12:07            11.368 parquet-common-1.4.1.jar
15/03/2016  12:07           272.946 parquet-encoding-1.4.1.jar
15/03/2016  12:07           471.470 parquet-format-2.0.0.jar
15/03/2016  12:07            10.385 parquet-generator-1.4.1.jar
15/03/2016  12:07           152.325 parquet-hadoop-1.4.1.jar
15/03/2016  12:10         2.764.448 parquet-hive-bundle-1.4.1.jar
15/03/2016  12:07         1.029.033 parquet-jackson-1.4.1.jar
15/03/2016  12:02           533.455 protobuf-java-2.5.0.jar
15/03/2016  12:09         7.137.903 scala-library-2.10.1.jar
15/03/2016  12:12           276.420 serializer-2.7.2.jar
15/03/2016  12:02           133.240 servlet-api-2.5-20110124.jar
15/03/2016  11:06            25.496 slf4j-api-1.6.1.jar
15/03/2016  11:06             9.753 slf4j-log4j12-1.6.1.jar
15/03/2016  11:06           410.710 snappy-java-1.1.0.jar
15/03/2016  12:12           284.077 twitter4j-core-3.0.3.jar
15/03/2016  12:12            27.698 twitter4j-media-support-3.0.3.jar
15/03/2016  12:12            56.307 twitter4j-stream-3.0.3.jar
15/03/2016  11:06           449.505 velocity-1.7.jar
15/03/2016  12:12         3.154.938 xalan-2.7.2.jar
15/03/2016  12:12         1.229.125 xercesImpl-2.9.1.jar
15/03/2016  12:12           194.354 xml-apis-1.3.04.jar
15/03/2016  11:06            94.672 xz-1.0.jar
15/03/2016  12:08            64.009 zkclient-0.3.jar

I tried to take the jar out of the sandbox (hadoop-common-2.7.1.2.4.0.0-169.jar) and it runs nicely.

with no problem, when send syslog message, and have this response:

2016-03-15 18:00:19,906 (hdfs-HDFS-LAB-call-runner-0) [ERROR - org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:385)] Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable C:\flume\bin\winutils.exe in the Hadoop binaries.

Solved with this guide:

https://github.com/spring-projects/spring-hadoop/wiki/Using-a-Windows-client-together-with-a-Linux-c...

I downloaded and placed under the folder c:\ flume\bin

- Launched agent

java -Xmx20m -Dlog4j.configuration=file:///%CD%\conf\log4j.properties -Dhadoop.home.dir=C:\flume -cp "c:\flume\lib\*" org.apache.flume.node.Application -f c:\flume\conf\syslog-agent-hdfs.conf -n syslog-agent

No problem for start.

c:\flume>java -Xmx20m -Dlog4j.configuration=file:///%CD%\conf\log4j.properties -Dhadoop.home.dir=C:\flume -cp "c:\flume\lib\*" org.apache.flume.node.Application -f c:\flume\conf\syslog-agent-hdfs.conf -n syslog-agent
2016-03-15 18:11:19,563 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting
2016-03-15 18:11:19,573 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:78)] Configuration provider started
2016-03-15 18:11:19,576 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] Checking file:c:\flume\conf\syslog-agent-hdfs.conf for changes
2016-03-15 18:11:19,595 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:c:\flume\conf\syslog-agent-hdfs.conf
2016-03-15 18:11:19,610 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,618 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1021)] Created context for HDFS-LAB: hdfs.round
2016-03-15 18:11:19,625 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,631 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,636 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,644 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,652 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:931)] Added sinks: HDFS-LAB Agent: syslog-agent
2016-03-15 18:11:19,659 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,664 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,673 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:314)] Starting validation of configuration for agent: syslog-agent, initial-configuration: AgentConfiguration[syslog-agent]
SOURCES: {Syslog={ parameters:{channels=MemoryChannel-1, port=5140, type=syslogTcp} }}
CHANNELS: {MemoryChannel-1={ parameters:{type=memory} }}
SINKS: {HDFS-LAB={ parameters:{hdfs.path=hdfs://sandbox.hortonworks.com:8020/apps/flume/%y-%m-%d/%H%M/%S, hdfs.file.Prefix=syslogfiles-, hdfs.round=true, channel=MemoryChannel-1, type=hdfs, hdfs.roundValue=10, hdfs.roundUnit=second} }}


2016-03-15 18:11:19,694 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:469)] Created channel MemoryChannel-1
2016-03-15 18:11:19,704 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:675)] Creating sink: HDFS-LAB using HDFS
2016-03-15 18:11:19,716 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:372)] Post validation configuration for syslog-agent
AgentConfiguration created without Configuration stubs for which only basic syntactical validation was performed[syslog-agent]
SOURCES: {Syslog={ parameters:{channels=MemoryChannel-1, port=5140, type=syslogTcp} }}
CHANNELS: {MemoryChannel-1={ parameters:{type=memory} }}
SINKS: {HDFS-LAB={ parameters:{hdfs.path=hdfs://sandbox.hortonworks.com:8020/apps/flume/%y-%m-%d/%H%M/%S, hdfs.file.Prefix=syslogfiles-, hdfs.round=true, channel=MemoryChannel-1, type=hdfs, hdfs.roundValue=10, hdfs.roundUnit=second} }}


2016-03-15 18:11:19,735 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:136)] Channels:MemoryChannel-1
2016-03-15 18:11:19,743 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:137)] Sinks HDFS-LAB
2016-03-15 18:11:19,747 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:138)] Sources Syslog
2016-03-15 18:11:19,752 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:141)] Post-validation flume configuration contains configuration for agents: [syslog-agent]
2016-03-15 18:11:19,759 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:145)] Creating channels
2016-03-15 18:11:19,768 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel MemoryChannel-1 type memory
2016-03-15 18:11:19,779 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:200)] Created channel MemoryChannel-1
2016-03-15 18:11:19,786 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source Syslog, type syslogTcp
2016-03-15 18:11:19,802 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: HDFS-LAB, type: hdfs
2016-03-15 18:11:19,815 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:114)] Channel MemoryChannel-1 connected to [Syslog, HDFS-LAB]
2016-03-15 18:11:19,828 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{Syslog=EventDrivenSourceRunner: { source:org.apache.flume.source.SyslogTcpSource{name:Syslog,state:IDLE} }} sinkRunners:{HDFS-LAB=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3f476562 counterGroup:{ name:null counters:{} } }} channels:{MemoryChannel-1=org.apache.flume.channel.MemoryChannel{name: MemoryChannel-1}} }
2016-03-15 18:11:19,855 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel MemoryChannel-1
2016-03-15 18:11:19,908 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: CHANNEL, name: MemoryChannel-1: Successfully registered new MBean.
2016-03-15 18:11:19,918 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: CHANNEL, name: MemoryChannel-1 started
2016-03-15 18:11:19,927 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink HDFS-LAB
2016-03-15 18:11:19,935 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source Syslog
2016-03-15 18:11:19,938 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: SINK, name: HDFS-LAB: Successfully registered new MBean.
2016-03-15 18:11:19,958 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SINK, name: HDFS-LAB started
2016-03-15 18:11:19,971 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:143)] Polling sink runner starting
2016-03-15 18:11:20,054 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.SyslogTcpSource.start(SyslogTcpSource.java:123)] Syslog TCP Source starting...
2016-03-15 18:11:49,947 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] Checking file:c:\flume\conf\syslog-agent-hdfs.conf for changes

When try to send syslog message, and have this response:

2016-03-15 18:18:33,672 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://sandbox.hortonworks.com:8020/apps/flume/16-03-15/1240/30/FlumeData.1458062313441.tmp
2016-03-15 18:18:33,715 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
        at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120)
        at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:236)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2812)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2802)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2668)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
        at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 18 more
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
        at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120)
        at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:236)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2812)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2802)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2668)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
        at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 18 more

At this point I think is missing some jar, you have idea what it is? Thank you

avatar
Master Mentor

@Alessio Ubaldi apache commons jar is missing. Just determine which class is asking for it and what version https://commons.apache.org/

avatar
Super Collaborator

That should be commons-configuration, commons-io and htrace-core from /usr/hdp/current/hadoop/lib

avatar
Expert Contributor

Hi,

thanks for suggestion.

I copy all jar missing file (I think), but when sink start I have this warning, and on hdfs write some tmp empty file.

error-sink.txt

avatar
Super Collaborator

avatar
Super Collaborator

I can propose much easier steps:

1. Download flume binaries - http://flume.apache.org/download.html and extract it somewhere (this is going to be a FLUME_HOME)

2. Download winutils and put it somwhere (f.e. C:/winutils/bin, in this case C:/winutils is going to be a HADOOP_HOME)

3. Copy all missed hdfs libs to your FLUME_HOME/lib (you can find them in your hadoop cluster, is always preferable to have exact the same versions as in /usr/hdp/current/hadoop or /usr/hdp/current/hadoop-hdfs)

4. Run flume agent with the following command:

bin\flume-ng agent -name MyAgent -f conf/MyAgent.properties -property "flume.root.logger=INFO,LOGFILE,console;flume.log.file=MyLog.log;hadoop.home.dir=C:/winutils"

avatar
Expert Contributor

Thank you, I try this solution today. I just have to start agent with java, with windows I can't started with bin\flume-ng command.

avatar
Super Collaborator

This is actually steps for windows. And i tested it locally - it works