Created 03-14-2016 11:19 AM
Hi to all,
I need to install the flume-agent (1.5) on a Windows environment, to collect the logs and bring them on a hdp cluster on azure.
Can I only configure the agent or need complete installation of flume?
There is a complete guide with all steps of all installation / configuration?
I searched on web but could not find a complete guide.
Thank you
Created 03-15-2016 08:38 PM
I can propose much easier steps:
1. Download flume binaries - http://flume.apache.org/download.html and extract it somewhere (this is going to be a FLUME_HOME)
2. Download winutils and put it somwhere (f.e. C:/winutils/bin, in this case C:/winutils is going to be a HADOOP_HOME)
3. Copy all missed hdfs libs to your FLUME_HOME/lib (you can find them in your hadoop cluster, is always preferable to have exact the same versions as in /usr/hdp/current/hadoop or /usr/hdp/current/hadoop-hdfs)
4. Run flume agent with the following command:
bin\flume-ng agent -name MyAgent -f conf/MyAgent.properties -property "flume.root.logger=INFO,LOGFILE,console;flume.log.file=MyLog.log;hadoop.home.dir=C:/winutils"
Created 03-14-2016 03:36 PM
I've never tried that scenario, but it should be possible.
All you need is to install flume on windows machine (just extract zip file) and add jars needed to connect to azure (if any).
You can use hdfs.kerberosPrincipal, hdfs.kerberosKeytab properties if you have secure hdfs
Regards
Created 03-15-2016 05:32 PM
Hello everyone,
I am writing to report the steps I followed.
For test I'm using Windows 10 as a client machine, which runs on the sandbox with VirtualBox 2.4.
I added the hosts file: 127.0.0.1 sandbox.hortonworks.com
I was inspired by these guide because with binary Apache have more error and other difficulties: http://mapredit.blogspot.it/2012/07/run-flume-13x-on-windows.html
- Installed jdk and maven
- Set the environment variable
- Compiled with maven
- Unpacking the tar in c:\flume
- Created the flume configuration file
syslog-agent.sources = Syslog syslog-agent.channels = MemoryChannel-1 syslog-agent.sinks = Console syslog-agent.sources.Syslog.type = syslogTcp syslog-agent.sources.Syslog.port = 5140 syslog-agent.sources.Syslog.channels = MemoryChannel-1 syslog-agent.sinks.Console.channel = MemoryChannel-1 syslog-agent.sinks.Console.type = logger syslog-agent.channels.MemoryChannel-1.type = memory
- Launched the client:
java -Xmx20m -Dlog4j.configuration=file:///%CD%\conf\log4j.properties -cp "c:\flume\lib\*" org.apache.flume.node.Application -f c:\flume\conf\syslog-agent.conf -n syslog-agent
- Try to send syslog message, and have this response:
2016-03-15 17:50:42,215 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{host=host, Severity=7, Facility=1, priority=15, timestamp=1458042030000} body: 66 6F 6F 5B 33 34 35 5D 3A 20 61 20 73 79 73 6C foo[345]: a sysl }
Works (yeah), then I try write log on HDFS (my final goal).
Change conf file in:
syslog-agent.sources = Syslog syslog-agent.channels = MemoryChannel-1 syslog-agent.sinks = HDFS-LAB syslog-agent.sources.Syslog.type = syslogTcp syslog-agent.sources.Syslog.port = 5140 syslog-agent.sources.Syslog.channels = MemoryChannel-1 syslog-agent.sinks.HDFS-LAB.channel = MemoryChannel-1 syslog-agent.sinks.HDFS-LAB.type = hdfs syslog-agent.sinks.HDFS-LAB.hdfs.path = hdfs://sandbox.hortonworks.com:8020/apps/flume/%y-%m-%d/%H%M/%S syslog-agent.sinks.HDFS-LAB.hdfs.file.Prefix = syslogfiles- syslog-agent.sinks.HDFS-LAB.hdfs.round = true syslog-agent.sinks.HDFS-LAB.hdfs.roundValue = 10 syslog-agent.sinks.HDFS-LAB.hdfs.roundUnit = second syslog-agent.channels.MemoryChannel-1.type = memory
Start agen, ERROR:
2016-03-15 18:05:50,322 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)] Failed to start agent because dependencies were not found in classpath. Error follows. java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:239) at org.apache.flume.conf.Configurables.configure(Configurables.java:41) at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:413) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:98) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.runAndReset(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) ... 12 more
The list of jar on c:\flume\lib:
15/03/2016 12:04 346.729 apache-log4j-extras-1.1.jar 15/03/2016 12:04 18.031 async-1.4.0.jar 15/03/2016 12:04 1.138.911 asynchbase-1.5.0.jar 15/03/2016 11:06 303.139 avro-1.7.4.jar 15/03/2016 11:06 187.840 avro-ipc-1.7.4.jar 15/03/2016 12:02 41.123 commons-cli-1.2.jar 15/03/2016 11:06 263.865 commons-codec-1.8.jar 15/03/2016 11:06 588.337 commons-collections-3.2.2.jar 15/03/2016 11:06 241.367 commons-compress-1.4.1.jar 15/03/2016 12:03 160.519 commons-dbcp-1.4.jar 15/03/2016 12:02 163.151 commons-io-2.1.jar 15/03/2016 12:10 267.634 commons-jexl-2.1.1.jar 15/03/2016 11:06 279.193 commons-lang-2.5.jar 15/03/2016 11:06 60.686 commons-logging-1.1.1.jar 15/03/2016 12:03 96.221 commons-pool-1.5.4.jar 15/03/2016 12:04 68.866 curator-client-2.6.0.jar 15/03/2016 12:04 185.245 curator-framework-2.6.0.jar 15/03/2016 12:04 248.171 curator-recipes-2.6.0.jar 15/03/2016 12:03 3.103.132 derby-10.11.1.1.jar 15/03/2016 12:12 20.623 flume-avro-source-1.7.0-SNAPSHOT.jar 15/03/2016 12:11 38.276 flume-dataset-sink-1.7.0-SNAPSHOT.jar 15/03/2016 12:03 295.717 flume-file-channel-1.7.0-SNAPSHOT.jar 15/03/2016 12:03 67.897 flume-hdfs-sink-1.7.0-SNAPSHOT.jar 15/03/2016 12:12 43.114 flume-hive-sink-1.7.0-SNAPSHOT.jar 15/03/2016 12:03 13.672 flume-irc-sink-1.7.0-SNAPSHOT.jar 15/03/2016 12:03 53.603 flume-jdbc-channel-1.7.0-SNAPSHOT.jar 15/03/2016 12:12 27.199 flume-jms-source-1.7.0-SNAPSHOT.jar 15/03/2016 12:12 21.662 flume-kafka-channel-1.7.0-SNAPSHOT.jar 15/03/2016 12:12 15.948 flume-kafka-source-1.7.0-SNAPSHOT.jar 15/03/2016 12:02 26.369 flume-ng-auth-1.7.0-SNAPSHOT.jar 15/03/2016 12:01 56.785 flume-ng-configuration-1.7.0-SNAPSHOT.jar 15/03/2016 12:02 381.642 flume-ng-core-1.7.0-SNAPSHOT.jar 15/03/2016 12:05 37.966 flume-ng-elasticsearch-sink-1.7.0-SNAPSHOT.jar 15/03/2016 12:04 20.620 flume-ng-embedded-agent-1.7.0-SNAPSHOT.jar 15/03/2016 12:04 53.204 flume-ng-hbase-sink-1.7.0-SNAPSHOT.jar 15/03/2016 12:09 15.267 flume-ng-kafka-sink-1.7.0-SNAPSHOT.jar 15/03/2016 12:12 16.668 flume-ng-log4jappender-1.7.0-SNAPSHOT.jar 15/03/2016 12:08 35.263 flume-ng-morphline-solr-sink-1.7.0-SNAPSHOT.jar 15/03/2016 12:04 37.514 flume-ng-node-1.7.0-SNAPSHOT.jar 15/03/2016 12:01 120.730 flume-ng-sdk-1.7.0-SNAPSHOT.jar 15/03/2016 12:12 44.551 flume-scribe-source-1.7.0-SNAPSHOT.jar 15/03/2016 12:03 22.533 flume-spillable-memory-channel-1.7.0-SNAPSHOT.jar 15/03/2016 12:12 32.807 flume-taildir-source-1.7.0-SNAPSHOT.jar 15/03/2016 12:12 56.552 flume-thrift-source-1.7.0-SNAPSHOT.jar 15/03/2016 12:13 18.917 flume-tools-1.7.0-SNAPSHOT.jar 15/03/2016 12:12 14.944 flume-twitter-source-1.7.0-SNAPSHOT.jar 15/03/2016 12:02 189.285 gson-2.2.2.jar 15/03/2016 12:01 1.648.200 guava-11.0.2.jar 15/03/2016 11:06 427.021 httpclient-4.2.1.jar 15/03/2016 11:06 181.201 httpcore-4.1.3.jar 15/03/2016 12:03 132.202 irclib-1.10.jar 15/03/2016 12:07 35.058 jackson-annotations-2.3.0.jar 15/03/2016 12:07 197.986 jackson-core-2.3.1.jar 15/03/2016 11:06 228.268 jackson-core-asl-1.9.3.jar 15/03/2016 12:07 914.311 jackson-databind-2.3.1.jar 15/03/2016 11:06 773.019 jackson-mapper-asl-1.9.3.jar 15/03/2016 11:06 539.912 jetty-6.1.26.jar 15/03/2016 11:06 177.131 jetty-util-6.1.26.jar 15/03/2016 12:02 570.478 joda-time-2.1.jar 15/03/2016 12:08 53.244 jopt-simple-3.2.jar 15/03/2016 12:01 33.015 jsr305-1.3.9.jar 15/03/2016 12:09 3.514.920 kafka_2.10-0.8.1.1.jar 15/03/2016 12:10 2.141.463 kite-data-core-1.0.0.jar 15/03/2016 12:10 2.020.522 kite-data-hbase-1.0.0.jar 15/03/2016 12:10 1.799.126 kite-data-hive-1.0.0.jar 15/03/2016 12:07 1.764.982 kite-hadoop-compatibility-1.0.0.jar 15/03/2016 11:06 347.531 libthrift-0.9.0.jar 15/03/2016 11:06 489.884 log4j-1.2.17.jar 15/03/2016 12:03 390.675 mapdb-0.9.9.jar 15/03/2016 12:04 82.123 metrics-core-2.2.0.jar 15/03/2016 12:02 644.934 mina-core-2.0.4.jar 15/03/2016 11:06 1.132.988 netty-3.5.12.Final.jar 15/03/2016 12:10 19.827 opencsv-2.3.jar 15/03/2016 11:06 29.555 paranamer-2.3.jar 15/03/2016 12:07 41.943 parquet-avro-1.4.1.jar 15/03/2016 12:07 724.377 parquet-column-1.4.1.jar 15/03/2016 12:07 11.368 parquet-common-1.4.1.jar 15/03/2016 12:07 272.946 parquet-encoding-1.4.1.jar 15/03/2016 12:07 471.470 parquet-format-2.0.0.jar 15/03/2016 12:07 10.385 parquet-generator-1.4.1.jar 15/03/2016 12:07 152.325 parquet-hadoop-1.4.1.jar 15/03/2016 12:10 2.764.448 parquet-hive-bundle-1.4.1.jar 15/03/2016 12:07 1.029.033 parquet-jackson-1.4.1.jar 15/03/2016 12:02 533.455 protobuf-java-2.5.0.jar 15/03/2016 12:09 7.137.903 scala-library-2.10.1.jar 15/03/2016 12:12 276.420 serializer-2.7.2.jar 15/03/2016 12:02 133.240 servlet-api-2.5-20110124.jar 15/03/2016 11:06 25.496 slf4j-api-1.6.1.jar 15/03/2016 11:06 9.753 slf4j-log4j12-1.6.1.jar 15/03/2016 11:06 410.710 snappy-java-1.1.0.jar 15/03/2016 12:12 284.077 twitter4j-core-3.0.3.jar 15/03/2016 12:12 27.698 twitter4j-media-support-3.0.3.jar 15/03/2016 12:12 56.307 twitter4j-stream-3.0.3.jar 15/03/2016 11:06 449.505 velocity-1.7.jar 15/03/2016 12:12 3.154.938 xalan-2.7.2.jar 15/03/2016 12:12 1.229.125 xercesImpl-2.9.1.jar 15/03/2016 12:12 194.354 xml-apis-1.3.04.jar 15/03/2016 11:06 94.672 xz-1.0.jar 15/03/2016 12:08 64.009 zkclient-0.3.jar
I tried to take the jar out of the sandbox (hadoop-common-2.7.1.2.4.0.0-169.jar) and it runs nicely.
with no problem, when send syslog message, and have this response:
2016-03-15 18:00:19,906 (hdfs-HDFS-LAB-call-runner-0) [ERROR - org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:385)] Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable C:\flume\bin\winutils.exe in the Hadoop binaries.
Solved with this guide:
I downloaded and placed under the folder c:\ flume\bin
- Launched agent
java -Xmx20m -Dlog4j.configuration=file:///%CD%\conf\log4j.properties -Dhadoop.home.dir=C:\flume -cp "c:\flume\lib\*" org.apache.flume.node.Application -f c:\flume\conf\syslog-agent-hdfs.conf -n syslog-agent
No problem for start.
c:\flume>java -Xmx20m -Dlog4j.configuration=file:///%CD%\conf\log4j.properties -Dhadoop.home.dir=C:\flume -cp "c:\flume\lib\*" org.apache.flume.node.Application -f c:\flume\conf\syslog-agent-hdfs.conf -n syslog-agent 2016-03-15 18:11:19,563 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] Configuration provider starting 2016-03-15 18:11:19,573 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:78)] Configuration provider started 2016-03-15 18:11:19,576 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] Checking file:c:\flume\conf\syslog-agent-hdfs.conf for changes 2016-03-15 18:11:19,595 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloading configuration file:c:\flume\conf\syslog-agent-hdfs.conf 2016-03-15 18:11:19,610 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB 2016-03-15 18:11:19,618 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1021)] Created context for HDFS-LAB: hdfs.round 2016-03-15 18:11:19,625 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB 2016-03-15 18:11:19,631 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB 2016-03-15 18:11:19,636 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB 2016-03-15 18:11:19,644 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB 2016-03-15 18:11:19,652 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:931)] Added sinks: HDFS-LAB Agent: syslog-agent 2016-03-15 18:11:19,659 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB 2016-03-15 18:11:19,664 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1017)] Processing:HDFS-LAB 2016-03-15 18:11:19,673 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:314)] Starting validation of configuration for agent: syslog-agent, initial-configuration: AgentConfiguration[syslog-agent] SOURCES: {Syslog={ parameters:{channels=MemoryChannel-1, port=5140, type=syslogTcp} }} CHANNELS: {MemoryChannel-1={ parameters:{type=memory} }} SINKS: {HDFS-LAB={ parameters:{hdfs.path=hdfs://sandbox.hortonworks.com:8020/apps/flume/%y-%m-%d/%H%M/%S, hdfs.file.Prefix=syslogfiles-, hdfs.round=true, channel=MemoryChannel-1, type=hdfs, hdfs.roundValue=10, hdfs.roundUnit=second} }} 2016-03-15 18:11:19,694 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:469)] Created channel MemoryChannel-1 2016-03-15 18:11:19,704 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:675)] Creating sink: HDFS-LAB using HDFS 2016-03-15 18:11:19,716 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:372)] Post validation configuration for syslog-agent AgentConfiguration created without Configuration stubs for which only basic syntactical validation was performed[syslog-agent] SOURCES: {Syslog={ parameters:{channels=MemoryChannel-1, port=5140, type=syslogTcp} }} CHANNELS: {MemoryChannel-1={ parameters:{type=memory} }} SINKS: {HDFS-LAB={ parameters:{hdfs.path=hdfs://sandbox.hortonworks.com:8020/apps/flume/%y-%m-%d/%H%M/%S, hdfs.file.Prefix=syslogfiles-, hdfs.round=true, channel=MemoryChannel-1, type=hdfs, hdfs.roundValue=10, hdfs.roundUnit=second} }} 2016-03-15 18:11:19,735 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:136)] Channels:MemoryChannel-1 2016-03-15 18:11:19,743 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:137)] Sinks HDFS-LAB 2016-03-15 18:11:19,747 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:138)] Sources Syslog 2016-03-15 18:11:19,752 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:141)] Post-validation flume configuration contains configuration for agents: [syslog-agent] 2016-03-15 18:11:19,759 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:145)] Creating channels 2016-03-15 18:11:19,768 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel MemoryChannel-1 type memory 2016-03-15 18:11:19,779 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:200)] Created channel MemoryChannel-1 2016-03-15 18:11:19,786 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source Syslog, type syslogTcp 2016-03-15 18:11:19,802 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: HDFS-LAB, type: hdfs 2016-03-15 18:11:19,815 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:114)] Channel MemoryChannel-1 connected to [Syslog, HDFS-LAB] 2016-03-15 18:11:19,828 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] Starting new configuration:{ sourceRunners:{Syslog=EventDrivenSourceRunner: { source:org.apache.flume.source.SyslogTcpSource{name:Syslog,state:IDLE} }} sinkRunners:{HDFS-LAB=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3f476562 counterGroup:{ name:null counters:{} } }} channels:{MemoryChannel-1=org.apache.flume.channel.MemoryChannel{name: MemoryChannel-1}} } 2016-03-15 18:11:19,855 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] Starting Channel MemoryChannel-1 2016-03-15 18:11:19,908 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: CHANNEL, name: MemoryChannel-1: Successfully registered new MBean. 2016-03-15 18:11:19,918 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: CHANNEL, name: MemoryChannel-1 started 2016-03-15 18:11:19,927 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] Starting Sink HDFS-LAB 2016-03-15 18:11:19,935 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Starting Source Syslog 2016-03-15 18:11:19,938 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:120)] Monitored counter group for type: SINK, name: HDFS-LAB: Successfully registered new MBean. 2016-03-15 18:11:19,958 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:96)] Component type: SINK, name: HDFS-LAB started 2016-03-15 18:11:19,971 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:143)] Polling sink runner starting 2016-03-15 18:11:20,054 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.SyslogTcpSource.start(SyslogTcpSource.java:123)] Syslog TCP Source starting... 2016-03-15 18:11:49,947 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] Checking file:c:\flume\conf\syslog-agent-hdfs.conf for changes
When try to send syslog message, and have this response:
2016-03-15 18:18:33,672 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://sandbox.hortonworks.com:8020/apps/flume/16-03-15/1240/30/FlumeData.1458062313441.tmp 2016-03-15 18:18:33,715 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36) at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120) at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:236) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2802) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2668) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) ... 18 more Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36) at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:120) at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:236) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2812) at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2802) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2668) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) ... 18 more
At this point I think is missing some jar, you have idea what it is? Thank you
Created 03-15-2016 05:35 PM
@Alessio Ubaldi apache commons jar is missing. Just determine which class is asking for it and what version https://commons.apache.org/
Created 03-15-2016 08:30 PM
That should be commons-configuration, commons-io and htrace-core from /usr/hdp/current/hadoop/lib
Created 03-16-2016 10:32 AM
Hi,
thanks for suggestion.
I copy all jar missing file (I think), but when sink start I have this warning, and on hdfs write some tmp empty file.
Created 03-16-2016 11:52 AM
Created 03-15-2016 08:38 PM
I can propose much easier steps:
1. Download flume binaries - http://flume.apache.org/download.html and extract it somewhere (this is going to be a FLUME_HOME)
2. Download winutils and put it somwhere (f.e. C:/winutils/bin, in this case C:/winutils is going to be a HADOOP_HOME)
3. Copy all missed hdfs libs to your FLUME_HOME/lib (you can find them in your hadoop cluster, is always preferable to have exact the same versions as in /usr/hdp/current/hadoop or /usr/hdp/current/hadoop-hdfs)
4. Run flume agent with the following command:
bin\flume-ng agent -name MyAgent -f conf/MyAgent.properties -property "flume.root.logger=INFO,LOGFILE,console;flume.log.file=MyLog.log;hadoop.home.dir=C:/winutils"
Created 03-16-2016 10:40 AM
Thank you, I try this solution today. I just have to start agent with java, with windows I can't started with bin\flume-ng command.
Created 03-16-2016 11:45 AM
This is actually steps for windows. And i tested it locally - it works