Support Questions

Find answers, ask questions, and share your expertise

Flume agent on windows

avatar
Expert Contributor

Hi to all,

I need to install the flume-agent (1.5) on a Windows environment, to collect the logs and bring them on a hdp cluster on azure.

Can I only configure the agent or need complete installation of flume?

There is a complete guide with all steps of all installation / configuration?

I searched on web but could not find a complete guide.

Thank you

1 ACCEPTED SOLUTION

avatar
Super Collaborator

I can propose much easier steps:

1. Download flume binaries - http://flume.apache.org/download.html and extract it somewhere (this is going to be a FLUME_HOME)

2. Download winutils and put it somwhere (f.e. C:/winutils/bin, in this case C:/winutils is going to be a HADOOP_HOME)

3. Copy all missed hdfs libs to your FLUME_HOME/lib (you can find them in your hadoop cluster, is always preferable to have exact the same versions as in /usr/hdp/current/hadoop or /usr/hdp/current/hadoop-hdfs)

4. Run flume agent with the following command:

bin\flume-ng agent -name MyAgent -f conf/MyAgent.properties -property "flume.root.logger=INFO,LOGFILE,console;flume.log.file=MyLog.log;hadoop.home.dir=C:/winutils"

View solution in original post

14 REPLIES 14

avatar
Expert Contributor

I try and agent start correctly.

Now i used cluster stored on azure.

I download jar file and hdfs conf client from cluster (put on /conf dir of flume).

When sink start have this error:

(SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:455)] HDFS IO error java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

I try to add this part on core-site.xml (resolve this error on sandbox)

<property>

<name>fs.file.impl</name>

<value>org.apache.hadoop.fs.LocalFileSystem</value>

<description>The FileSystem for file: uris.</description>

</property>

<property>

<name>fs.hdfs.impl</name>

<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>

<description>The FileSystem for hdfs: uris.</description>

</property>

But have this error:

2016-03-17 16:25:29,380 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 16 more

2016-03-17 16:25:29,429 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows. org.apache.flume.EventDeliveryException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:463) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 16 more

Thanks for help

avatar
Super Collaborator

Use jar files from your azure cluster, not sandbox. You need exactly same versions of libs used on azure cluster.

Also copy core-site.xml to flume classpath (FLUME_HOME/conf should be fine)

Regards

avatar
Expert Contributor

I copy jar from azure :) btw now I erase all and restart config, hope find the problem

avatar
Expert Contributor

nothing. I reinstall all, but have same error (No FileSystem for scheme: hdfs)

list of jar on flume windows:

apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
asm-3.2.jar
avro-1.7.4.jar
aws-java-sdk-1.7.4.jar
azure-storage-2.2.0.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.4.jar
commons-collections-3.2.2.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
flume-avro-source-1.6.0.jar
flume-dataset-sink-1.6.0.jar
flume-file-channel-1.6.0.jar
flume-hdfs-sink-1.6.0.jar
flume-hive-sink-1.6.0.jar
flume-irc-sink-1.6.0.jar
flume-jdbc-channel-1.6.0.jar
flume-jms-source-1.6.0.jar
flume-kafka-channel-1.6.0.jar
flume-kafka-source-1.6.0.jar
flume-ng-auth-1.6.0.jar
flume-ng-configuration-1.6.0.jar
flume-ng-core-1.6.0.jar
flume-ng-elasticsearch-sink-1.6.0.jar
flume-ng-embedded-agent-1.6.0.jar
flume-ng-hbase-sink-1.6.0.jar
flume-ng-kafka-sink-1.6.0.jar
flume-ng-log4jappender-1.6.0.jar
flume-ng-morphline-solr-sink-1.6.0.jar
flume-ng-node-1.6.0.jar
flume-ng-sdk-1.6.0.jar
flume-scribe-source-1.6.0.jar
flume-spillable-memory-channel-1.6.0.jar
flume-thrift-source-1.6.0.jar
flume-tools-1.6.0.jar
flume-twitter-source-1.6.0.jar
gson-2.2.4.jar
guava-11.0.2.jar
hadoop-annotations-2.7.1.2.3.4.0-3485.jar
hadoop-auth-2.7.1.2.3.4.0-3485.jar
hadoop-aws-2.7.1.2.3.4.0-3485.jar
hadoop-azure-2.7.1.2.3.4.0-3485.jar
hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
hadoop-common-2.7.1.2.3.4.0-3485.jar
hadoop-nfs-2.7.1.2.3.4.0-3485.jar
hamcrest-core-1.3.jar
htrace-core-3.1.0-incubating.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
jackson-annotations-2.2.3.jar
jackson-core-2.2.3.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.2.3.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-xc-1.9.13.jar
java-xmlbuilder-0.4.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jersey-core-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.hwx.jar
jetty-util-6.1.26.hwx.jar
jsch-0.1.42.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
junit-4.11.jar
log4j-1.2.17.jar
microsoft-windowsazure-storage-sdk-0.6.0.jar
mockito-all-1.8.5.jar
netty-3.6.2.Final.jar
paranamer-2.3.jar
protobuf-java-2.5.0.jar
ranger-hdfs-plugin-shim-0.5.0.2.3.4.0-3485.jar
ranger-plugin-classloader-0.5.0.2.3.4.0-3485.jar
ranger-yarn-plugin-shim-0.5.0.2.3.4.0-3485.jar
servlet-api-2.5.jar
slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar
snappy-java-1.0.4.1.jar
stax-api-1.0-2.jar
xmlenc-0.52.jar
xz-1.0.jar
zookeeper-3.4.6.2.3.4.0-3485.jar

List of jar of cluster on azure:

activation-1.1.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
asm-3.2.jar
avro-1.7.4.jar
aws-java-sdk-1.7.4.jar
azure-storage-2.2.0.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.4.jar
commons-collections-3.2.2.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
gson-2.2.4.jar
guava-11.0.2.jar
hamcrest-core-1.3.jar
htrace-core-3.1.0-incubating.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
jackson-annotations-2.2.3.jar
jackson-core-2.2.3.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.2.3.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-xc-1.9.13.jar
java-xmlbuilder-0.4.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jersey-core-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.hwx.jar
jetty-util-6.1.26.hwx.jar
jsch-0.1.42.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
junit-4.11.jar
log4j-1.2.17.jar
microsoft-windowsazure-storage-sdk-0.6.0.jar
mockito-all-1.8.5.jar
native
netty-3.6.2.Final.jar
ojdbc6.jar
paranamer-2.3.jar
protobuf-java-2.5.0.jar
ranger-hdfs-plugin-impl
ranger-hdfs-plugin-shim-0.5.0.2.3.4.0-3485.jar
ranger-plugin-classloader-0.5.0.2.3.4.0-3485.jar
ranger-yarn-plugin-impl
ranger-yarn-plugin-shim-0.5.0.2.3.4.0-3485.jar
servlet-api-2.5.jar
slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar
snappy-java-1.0.4.1.jar
stax-api-1.0-2.jar
xmlenc-0.52.jar
xz-1.0.jar
zookeeper-3.4.6.2.3.4.0-3485.jar

for last attempt I put on flume/lib directory all jar of hadoop cluster. No have idea what is the problem. Thanks for patience and help.

avatar
Super Collaborator
hadoop-annotations-2.7.1.2.3.4.0-3485.jar
hadoop-auth-2.7.1.2.3.4.0-3485.jar
hadoop-aws-2.7.1.2.3.4.0-3485.jar
hadoop-azure-2.7.1.2.3.4.0-3485.jar
hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
hadoop-common-2.7.1.2.3.4.0-3485.jar
hadoop-nfs-2.7.1.2.3.4.0-3485.jar

Double check it's a classes from Azure.. also you need to add hadoop-hdfs.jar and core-site.xml