Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Flume agent on windows

avatar
Rising Star

Hi to all,

I need to install the flume-agent (1.5) on a Windows environment, to collect the logs and bring them on a hdp cluster on azure.

Can I only configure the agent or need complete installation of flume?

There is a complete guide with all steps of all installation / configuration?

I searched on web but could not find a complete guide.

Thank you

1 ACCEPTED SOLUTION

avatar
Expert Contributor

I can propose much easier steps:

1. Download flume binaries - http://flume.apache.org/download.html and extract it somewhere (this is going to be a FLUME_HOME)

2. Download winutils and put it somwhere (f.e. C:/winutils/bin, in this case C:/winutils is going to be a HADOOP_HOME)

3. Copy all missed hdfs libs to your FLUME_HOME/lib (you can find them in your hadoop cluster, is always preferable to have exact the same versions as in /usr/hdp/current/hadoop or /usr/hdp/current/hadoop-hdfs)

4. Run flume agent with the following command:

bin\flume-ng agent -name MyAgent -f conf/MyAgent.properties -property "flume.root.logger=INFO,LOGFILE,console;flume.log.file=MyLog.log;hadoop.home.dir=C:/winutils"

View solution in original post

14 REPLIES 14

avatar
Expert Contributor

I've never tried that scenario, but it should be possible.

All you need is to install flume on windows machine (just extract zip file) and add jars needed to connect to azure (if any).

You can use hdfs.kerberosPrincipal, hdfs.kerberosKeytab properties if you have secure hdfs

Regards

avatar
Rising Star

Hello everyone,

I am writing to report the steps I followed.

For test I'm using Windows 10 as a client machine, which runs on the sandbox with VirtualBox 2.4.

I added the hosts file: 127.0.0.1 sandbox.hortonworks.com

I was inspired by these guide because with binary Apache have more error and other difficulties: http://mapredit.blogspot.it/2012/07/run-flume-13x-on-windows.html

- Installed jdk and maven

- Set the environment variable

- Compiled with maven

- Unpacking the tar in c:\flume

- Created the flume configuration file

syslog-agent.sources = Syslog 
syslog-agent.channels = MemoryChannel-1 
syslog-agent.sinks = Console
syslog-agent.sources.Syslog.type = syslogTcp 
syslog-agent.sources.Syslog.port = 5140
syslog-agent.sources.Syslog.channels = MemoryChannel-1 
syslog-agent.sinks.Console.channel = MemoryChannel-1 
syslog-agent.sinks.Console.type = logger
syslog-agent.channels.MemoryChannel-1.type = memory

- Launched the client:

java -Xmx20m -Dlog4j.configuration=file:///%CD%\conf\log4j.properties -cp "c:\flume\lib\*" org.apache.flume.node.Application -f c:\flume\conf\syslog-agent.conf -n syslog-agent

- Try to send syslog message, and have this response:

2016-03-15 17:50:42,215 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{host=host, Severity=7, Facility=1, priority=15, timestamp=1458042030000} body: 66 6F 6F 5B 33 34 35 5D 3A 20 61 20 73 79 73 6C foo[345]: a sysl }

Works (yeah), then I try write log on HDFS (my final goal).

Change conf file in:

syslog-agent.sources = Syslog
syslog-agent.channels = MemoryChannel-1
syslog-agent.sinks = HDFS-LAB
syslog-agent.sources.Syslog.type = syslogTcp
syslog-agent.sources.Syslog.port = 5140
syslog-agent.sources.Syslog.channels = MemoryChannel-1
syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1
syslog-agent.sinks.HDFS-LAB.type = hdfs
syslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://sandbox.hortonworks.com:8020/apps/flume/%y-%m-%d/%H%M/%S
syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfiles-
syslog-agent.sinks.HDFS-LAB.hdfs.round = true
syslog-agent.sinks.HDFS-LAB.hdfs.roundValue = 10
syslog-agent.sinks.HDFS-LAB.hdfs.roundUnit = second
syslog-agent.channels.MemoryChannel-1.type = memory

Start agen, ERROR:

2016-03-15 18:05:50,322 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)] Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType
        at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:239)
        at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
        at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413)
        at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98)
        at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 12 more

The list of jar on c:\flume\lib:

15/03/2016  12:04           346.729 apache-log4j-extras-1.1.jar
15/03/2016  12:04            18.031 async-1.4.0.jar
15/03/2016  12:04         1.138.911 asynchbase-1.5.0.jar
15/03/2016  11:06           303.139 avro-1.7.4.jar
15/03/2016  11:06           187.840 avro-ipc-1.7.4.jar
15/03/2016  12:02            41.123 commons-cli-1.2.jar
15/03/2016  11:06           263.865 commons-codec-1.8.jar
15/03/2016  11:06           588.337 commons-collections-3.2.2.jar
15/03/2016  11:06           241.367 commons-compress-1.4.1.jar
15/03/2016  12:03           160.519 commons-dbcp-1.4.jar
15/03/2016  12:02           163.151 commons-io-2.1.jar
15/03/2016  12:10           267.634 commons-jexl-2.1.1.jar
15/03/2016  11:06           279.193 commons-lang-2.5.jar
15/03/2016  11:06            60.686 commons-logging-1.1.1.jar
15/03/2016  12:03            96.221 commons-pool-1.5.4.jar
15/03/2016  12:04            68.866 curator-client-2.6.0.jar
15/03/2016  12:04           185.245 curator-framework-2.6.0.jar
15/03/2016  12:04           248.171 curator-recipes-2.6.0.jar
15/03/2016  12:03         3.103.132 derby-10.11.1.1.jar
15/03/2016  12:12            20.623 flume-avro-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:11            38.276 flume-dataset-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:03           295.717 flume-file-channel-1.7.0-SNAPSHOT.jar
15/03/2016  12:03            67.897 flume-hdfs-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            43.114 flume-hive-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:03            13.672 flume-irc-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:03            53.603 flume-jdbc-channel-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            27.199 flume-jms-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            21.662 flume-kafka-channel-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            15.948 flume-kafka-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:02            26.369 flume-ng-auth-1.7.0-SNAPSHOT.jar
15/03/2016  12:01            56.785 flume-ng-configuration-1.7.0-SNAPSHOT.jar
15/03/2016  12:02           381.642 flume-ng-core-1.7.0-SNAPSHOT.jar
15/03/2016  12:05            37.966 flume-ng-elasticsearch-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:04            20.620 flume-ng-embedded-agent-1.7.0-SNAPSHOT.jar
15/03/2016  12:04            53.204 flume-ng-hbase-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:09            15.267 flume-ng-kafka-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            16.668 flume-ng-log4jappender-1.7.0-SNAPSHOT.jar
15/03/2016  12:08            35.263 flume-ng-morphline-solr-sink-1.7.0-SNAPSHOT.jar
15/03/2016  12:04            37.514 flume-ng-node-1.7.0-SNAPSHOT.jar
15/03/2016  12:01           120.730 flume-ng-sdk-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            44.551 flume-scribe-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:03            22.533 flume-spillable-memory-channel-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            32.807 flume-taildir-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            56.552 flume-thrift-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:13            18.917 flume-tools-1.7.0-SNAPSHOT.jar
15/03/2016  12:12            14.944 flume-twitter-source-1.7.0-SNAPSHOT.jar
15/03/2016  12:02           189.285 gson-2.2.2.jar
15/03/2016  12:01         1.648.200 guava-11.0.2.jar
15/03/2016  11:06           427.021 httpclient-4.2.1.jar
15/03/2016  11:06           181.201 httpcore-4.1.3.jar
15/03/2016  12:03           132.202 irclib-1.10.jar
15/03/2016  12:07            35.058 jackson-annotations-2.3.0.jar
15/03/2016  12:07           197.986 jackson-core-2.3.1.jar
15/03/2016  11:06           228.268 jackson-core-asl-1.9.3.jar
15/03/2016  12:07           914.311 jackson-databind-2.3.1.jar
15/03/2016  11:06           773.019 jackson-mapper-asl-1.9.3.jar
15/03/2016  11:06           539.912 jetty-6.1.26.jar
15/03/2016  11:06           177.131 jetty-util-6.1.26.jar
15/03/2016  12:02           570.478 joda-time-2.1.jar
15/03/2016  12:08            53.244 jopt-simple-3.2.jar
15/03/2016  12:01            33.015 jsr305-1.3.9.jar
15/03/2016  12:09         3.514.920 kafka_2.10-0.8.1.1.jar
15/03/2016  12:10         2.141.463 kite-data-core-1.0.0.jar
15/03/2016  12:10         2.020.522 kite-data-hbase-1.0.0.jar
15/03/2016  12:10         1.799.126 kite-data-hive-1.0.0.jar
15/03/2016  12:07         1.764.982 kite-hadoop-compatibility-1.0.0.jar
15/03/2016  11:06           347.531 libthrift-0.9.0.jar
15/03/2016  11:06           489.884 log4j-1.2.17.jar
15/03/2016  12:03           390.675 mapdb-0.9.9.jar
15/03/2016  12:04            82.123 metrics-core-2.2.0.jar
15/03/2016  12:02           644.934 mina-core-2.0.4.jar
15/03/2016  11:06         1.132.988 netty-3.5.12.Final.jar
15/03/2016  12:10            19.827 opencsv-2.3.jar
15/03/2016  11:06            29.555 paranamer-2.3.jar
15/03/2016  12:07            41.943 parquet-avro-1.4.1.jar
15/03/2016  12:07           724.377 parquet-column-1.4.1.jar
15/03/2016  12:07            11.368 parquet-common-1.4.1.jar
15/03/2016  12:07           272.946 parquet-encoding-1.4.1.jar
15/03/2016  12:07           471.470 parquet-format-2.0.0.jar
15/03/2016  12:07            10.385 parquet-generator-1.4.1.jar
15/03/2016  12:07           152.325 parquet-hadoop-1.4.1.jar
15/03/2016  12:10         2.764.448 parquet-hive-bundle-1.4.1.jar
15/03/2016  12:07         1.029.033 parquet-jackson-1.4.1.jar
15/03/2016  12:02           533.455 protobuf-java-2.5.0.jar
15/03/2016  12:09         7.137.903 scala-library-2.10.1.jar
15/03/2016  12:12           276.420 serializer-2.7.2.jar
15/03/2016  12:02           133.240 servlet-api-2.5-20110124.jar
15/03/2016  11:06            25.496 slf4j-api-1.6.1.jar
15/03/2016  11:06             9.753 slf4j-log4j12-1.6.1.jar
15/03/2016  11:06           410.710 snappy-java-1.1.0.jar
15/03/2016  12:12           284.077 twitter4j-core-3.0.3.jar
15/03/2016  12:12            27.698 twitter4j-media-support-3.0.3.jar
15/03/2016  12:12            56.307 twitter4j-stream-3.0.3.jar
15/03/2016  11:06           449.505 velocity-1.7.jar
15/03/2016  12:12         3.154.938 xalan-2.7.2.jar
15/03/2016  12:12         1.229.125 xercesImpl-2.9.1.jar
15/03/2016  12:12           194.354 xml-apis-1.3.04.jar
15/03/2016  11:06            94.672 xz-1.0.jar
15/03/2016  12:08            64.009 zkclient-0.3.jar

I tried to take the jar out of the sandbox (hadoop-common-2.7.1.2.4.0.0-169.jar) and it runs nicely.

with no problem, when send syslog message, and have this response:

2016-03-15 18:00:19,906 (hdfs-HDFS-LAB-call-runner-0) [ERROR - org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:385)] Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable C:\flume\bin\winutils.exe in the Hadoop binaries.

Solved with this guide:

https://github.com/spring-projects/spring-hadoop/wiki/Using-a-Windows-client-together-with-a-Linux-c...

I downloaded and placed under the folder c:\ flume\bin

- Launched agent

java -Xmx20m -Dlog4j.configuration=file:///%CD%\conf\log4j.properties -Dhadoop.home.dir=C:\flume -cp "c:\flume\lib\*" org.apache.flume.node.Application -f c:\flume\conf\syslog-agent-hdfs.conf -n syslog-agent

No problem for start.

c:\flume>java -Xmx20m -Dlog4j.configuration=file:///%CD%\conf\log4j.properties -Dhadoop.home.dir=C:\flume -cp "c:\flume\lib\*" org.apache.flume.node.Application -f c:\flume\conf\syslog-agent-hdfs.conf -n syslog-agent
2016-03-15 18:11:19,563 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting
2016-03-15 18:11:19,573 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:78)] Configuration provider started
2016-03-15 18:11:19,576 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] Checking file:c:\flume\conf\syslog-agent-hdfs.conf for changes
2016-03-15 18:11:19,595 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:c:\flume\conf\syslog-agent-hdfs.conf
2016-03-15 18:11:19,610 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,618 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1021)] Created context for HDFS-LAB: hdfs.round
2016-03-15 18:11:19,625 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,631 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,636 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,644 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,652 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:931)] Added sinks: HDFS-LAB Agent: syslog-agent
2016-03-15 18:11:19,659 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,664 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB
2016-03-15 18:11:19,673 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:314)] Starting validation of configuration for agent: syslog-agent, initial-configuration: AgentConfiguration[syslog-agent]
SOURCES: {Syslog={ parameters:{channels=MemoryChannel-1, port=5140, type=syslogTcp} }}
CHANNELS: {MemoryChannel-1={ parameters:{type=memory} }}
SINKS: {HDFS-LAB={ parameters:{hdfs.path=hdfs://sandbox.hortonworks.com:8020/apps/flume/%y-%m-%d/%H%M/%S, hdfs.file.Prefix=syslogfiles-, hdfs.round=true, channel=MemoryChannel-1, type=hdfs, hdfs.roundValue=10, hdfs.roundUnit=second} }}


2016-03-15 18:11:19,694 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:469)] Created channel MemoryChannel-1
2016-03-15 18:11:19,704 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:675)] Creating sink: HDFS-LAB using HDFS
2016-03-15 18:11:19,716 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:372)] Post validation configuration for syslog-agent
AgentConfiguration created without Configuration stubs for which only basic syntactical validation was performed[syslog-agent]
SOURCES: {Syslog={ parameters:{channels=MemoryChannel-1, port=5140, type=syslogTcp} }}
CHANNELS: {MemoryChannel-1={ parameters:{type=memory} }}
SINKS: {HDFS-LAB={ parameters:{hdfs.path=hdfs://sandbox.hortonworks.com:8020/apps/flume/%y-%m-%d/%H%M/%S, hdfs.file.Prefix=syslogfiles-, hdfs.round=true, channel=MemoryChannel-1, type=hdfs, hdfs.roundValue=10, hdfs.roundUnit=second} }}


2016-03-15 18:11:19,735 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:136)] Channels:MemoryChannel-1
2016-03-15 18:11:19,743 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:137)] Sinks HDFS-LAB
2016-03-15 18:11:19,747 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:138)] Sources Syslog
2016-03-15 18:11:19,752 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:141)] Post-validation flume configuration contains configuration for agents: [syslog-agent]
2016-03-15 18:11:19,759 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:145)] Creating channels
2016-03-15 18:11:19,768 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel MemoryChannel-1 type memory
2016-03-15 18:11:19,779 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:200)] Created channel MemoryChannel-1
2016-03-15 18:11:19,786 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source Syslog, type syslogTcp
2016-03-15 18:11:19,802 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: HDFS-LAB, type: hdfs
2016-03-15 18:11:19,815 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:114)] Channel MemoryChannel-1 connected to [Syslog, HDFS-LAB]
2016-03-15 18:11:19,828 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{Syslog=EventDrivenSourceRunner: { source:org.apache.flume.source.SyslogTcpSource{name:Syslog,state:IDLE} }} sinkRunners:{HDFS-LAB=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3f476562 counterGroup:{ name:null counters:{} } }} channels:{MemoryChannel-1=org.apache.flume.channel.MemoryChannel{name: MemoryChannel-1}} }
2016-03-15 18:11:19,855 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel MemoryChannel-1
2016-03-15 18:11:19,908 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: CHANNEL, name: MemoryChannel-1: Successfully registered new MBean.
2016-03-15 18:11:19,918 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: CHANNEL, name: MemoryChannel-1 started
2016-03-15 18:11:19,927 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink HDFS-LAB
2016-03-15 18:11:19,935 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source Syslog
2016-03-15 18:11:19,938 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: SINK, name: HDFS-LAB: Successfully registered new MBean.
2016-03-15 18:11:19,958 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SINK, name: HDFS-LAB started
2016-03-15 18:11:19,971 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:143)] Polling sink runner starting
2016-03-15 18:11:20,054 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.SyslogTcpSource.start(SyslogTcpSource.java:123)] Syslog TCP Source starting...
2016-03-15 18:11:49,947 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] Checking file:c:\flume\conf\syslog-agent-hdfs.conf for changes

When try to send syslog message, and have this response:

2016-03-15 18:18:33,672 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://sandbox.hortonworks.com:8020/apps/flume/16-03-15/1240/30/FlumeData.1458062313441.tmp
2016-03-15 18:18:33,715 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
        at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120)
        at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:236)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2812)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2802)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2668)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
        at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 18 more
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
        at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
        at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120)
        at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:236)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2812)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2802)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2668)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
        at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
        at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
        at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
        at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
        at java.net.URLClassLoader.findClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        ... 18 more

At this point I think is missing some jar, you have idea what it is? Thank you

avatar
Mentor

@Alessio Ubaldi apache commons jar is missing. Just determine which class is asking for it and what version https://commons.apache.org/

avatar
Expert Contributor

That should be commons-configuration, commons-io and htrace-core from /usr/hdp/current/hadoop/lib

avatar
Rising Star

Hi,

thanks for suggestion.

I copy all jar missing file (I think), but when sink start I have this warning, and on hdfs write some tmp empty file.

error-sink.txt

avatar
Expert Contributor

avatar
Expert Contributor

I can propose much easier steps:

1. Download flume binaries - http://flume.apache.org/download.html and extract it somewhere (this is going to be a FLUME_HOME)

2. Download winutils and put it somwhere (f.e. C:/winutils/bin, in this case C:/winutils is going to be a HADOOP_HOME)

3. Copy all missed hdfs libs to your FLUME_HOME/lib (you can find them in your hadoop cluster, is always preferable to have exact the same versions as in /usr/hdp/current/hadoop or /usr/hdp/current/hadoop-hdfs)

4. Run flume agent with the following command:

bin\flume-ng agent -name MyAgent -f conf/MyAgent.properties -property "flume.root.logger=INFO,LOGFILE,console;flume.log.file=MyLog.log;hadoop.home.dir=C:/winutils"

avatar
Rising Star

Thank you, I try this solution today. I just have to start agent with java, with windows I can't started with bin\flume-ng command.

avatar
Expert Contributor

This is actually steps for windows. And i tested it locally - it works

avatar
Rising Star

I try and agent start correctly.

Now i used cluster stored on azure.

I download jar file and hdfs conf client from cluster (put on /conf dir of flume).

When sink start have this error:

(SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:455)] HDFS IO error java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

I try to add this part on core-site.xml (resolve this error on sandbox)

<property>

<name>fs.file.impl</name>

<value>org.apache.hadoop.fs.LocalFileSystem</value>

<description>The FileSystem for file: uris.</description>

</property>

<property>

<name>fs.hdfs.impl</name>

<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>

<description>The FileSystem for hdfs: uris.</description>

</property>

But have this error:

2016-03-17 16:25:29,380 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 16 more

2016-03-17 16:25:29,429 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows. org.apache.flume.EventDeliveryException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:463) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 16 more

Thanks for help

avatar
Expert Contributor

Use jar files from your azure cluster, not sandbox. You need exactly same versions of libs used on azure cluster.

Also copy core-site.xml to flume classpath (FLUME_HOME/conf should be fine)

Regards

avatar
Rising Star

I copy jar from azure :) btw now I erase all and restart config, hope find the problem

avatar
Rising Star

nothing. I reinstall all, but have same error (No FileSystem for scheme: hdfs)

list of jar on flume windows:

apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
asm-3.2.jar
avro-1.7.4.jar
aws-java-sdk-1.7.4.jar
azure-storage-2.2.0.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.4.jar
commons-collections-3.2.2.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
flume-avro-source-1.6.0.jar
flume-dataset-sink-1.6.0.jar
flume-file-channel-1.6.0.jar
flume-hdfs-sink-1.6.0.jar
flume-hive-sink-1.6.0.jar
flume-irc-sink-1.6.0.jar
flume-jdbc-channel-1.6.0.jar
flume-jms-source-1.6.0.jar
flume-kafka-channel-1.6.0.jar
flume-kafka-source-1.6.0.jar
flume-ng-auth-1.6.0.jar
flume-ng-configuration-1.6.0.jar
flume-ng-core-1.6.0.jar
flume-ng-elasticsearch-sink-1.6.0.jar
flume-ng-embedded-agent-1.6.0.jar
flume-ng-hbase-sink-1.6.0.jar
flume-ng-kafka-sink-1.6.0.jar
flume-ng-log4jappender-1.6.0.jar
flume-ng-morphline-solr-sink-1.6.0.jar
flume-ng-node-1.6.0.jar
flume-ng-sdk-1.6.0.jar
flume-scribe-source-1.6.0.jar
flume-spillable-memory-channel-1.6.0.jar
flume-thrift-source-1.6.0.jar
flume-tools-1.6.0.jar
flume-twitter-source-1.6.0.jar
gson-2.2.4.jar
guava-11.0.2.jar
hadoop-annotations-2.7.1.2.3.4.0-3485.jar
hadoop-auth-2.7.1.2.3.4.0-3485.jar
hadoop-aws-2.7.1.2.3.4.0-3485.jar
hadoop-azure-2.7.1.2.3.4.0-3485.jar
hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
hadoop-common-2.7.1.2.3.4.0-3485.jar
hadoop-nfs-2.7.1.2.3.4.0-3485.jar
hamcrest-core-1.3.jar
htrace-core-3.1.0-incubating.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
jackson-annotations-2.2.3.jar
jackson-core-2.2.3.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.2.3.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-xc-1.9.13.jar
java-xmlbuilder-0.4.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jersey-core-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.hwx.jar
jetty-util-6.1.26.hwx.jar
jsch-0.1.42.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
junit-4.11.jar
log4j-1.2.17.jar
microsoft-windowsazure-storage-sdk-0.6.0.jar
mockito-all-1.8.5.jar
netty-3.6.2.Final.jar
paranamer-2.3.jar
protobuf-java-2.5.0.jar
ranger-hdfs-plugin-shim-0.5.0.2.3.4.0-3485.jar
ranger-plugin-classloader-0.5.0.2.3.4.0-3485.jar
ranger-yarn-plugin-shim-0.5.0.2.3.4.0-3485.jar
servlet-api-2.5.jar
slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar
snappy-java-1.0.4.1.jar
stax-api-1.0-2.jar
xmlenc-0.52.jar
xz-1.0.jar
zookeeper-3.4.6.2.3.4.0-3485.jar

List of jar of cluster on azure:

activation-1.1.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
asm-3.2.jar
avro-1.7.4.jar
aws-java-sdk-1.7.4.jar
azure-storage-2.2.0.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.4.jar
commons-collections-3.2.2.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
gson-2.2.4.jar
guava-11.0.2.jar
hamcrest-core-1.3.jar
htrace-core-3.1.0-incubating.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
jackson-annotations-2.2.3.jar
jackson-core-2.2.3.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.2.3.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-xc-1.9.13.jar
java-xmlbuilder-0.4.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jersey-core-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.hwx.jar
jetty-util-6.1.26.hwx.jar
jsch-0.1.42.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
junit-4.11.jar
log4j-1.2.17.jar
microsoft-windowsazure-storage-sdk-0.6.0.jar
mockito-all-1.8.5.jar
native
netty-3.6.2.Final.jar
ojdbc6.jar
paranamer-2.3.jar
protobuf-java-2.5.0.jar
ranger-hdfs-plugin-impl
ranger-hdfs-plugin-shim-0.5.0.2.3.4.0-3485.jar
ranger-plugin-classloader-0.5.0.2.3.4.0-3485.jar
ranger-yarn-plugin-impl
ranger-yarn-plugin-shim-0.5.0.2.3.4.0-3485.jar
servlet-api-2.5.jar
slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar
snappy-java-1.0.4.1.jar
stax-api-1.0-2.jar
xmlenc-0.52.jar
xz-1.0.jar
zookeeper-3.4.6.2.3.4.0-3485.jar

for last attempt I put on flume/lib directory all jar of hadoop cluster. No have idea what is the problem. Thanks for patience and help.

avatar
Expert Contributor
hadoop-annotations-2.7.1.2.3.4.0-3485.jar
hadoop-auth-2.7.1.2.3.4.0-3485.jar
hadoop-aws-2.7.1.2.3.4.0-3485.jar
hadoop-azure-2.7.1.2.3.4.0-3485.jar
hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
hadoop-common-2.7.1.2.3.4.0-3485.jar
hadoop-nfs-2.7.1.2.3.4.0-3485.jar

Double check it's a classes from Azure.. also you need to add hadoop-hdfs.jar and core-site.xml

Labels