Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Flume agent on windows

avatar
Expert Contributor

Hi to all,

I need to install the flume-agent (1.5) on a Windows environment, to collect the logs and bring them on a hdp cluster on azure.

Can I only configure the agent or need complete installation of flume?

There is a complete guide with all steps of all installation / configuration?

I searched on web but could not find a complete guide.

Thank you

1 ACCEPTED SOLUTION

avatar
Super Collaborator

I can propose much easier steps:

1. Download flume binaries - http://flume.apache.org/download.html and extract it somewhere (this is going to be a FLUME_HOME)

2. Download winutils and put it somwhere (f.e. C:/winutils/bin, in this case C:/winutils is going to be a HADOOP_HOME)

3. Copy all missed hdfs libs to your FLUME_HOME/lib (you can find them in your hadoop cluster, is always preferable to have exact the same versions as in /usr/hdp/current/hadoop or /usr/hdp/current/hadoop-hdfs)

4. Run flume agent with the following command:

bin\flume-ng agent -name MyAgent -f conf/MyAgent.properties -property "flume.root.logger=INFO,LOGFILE,console;flume.log.file=MyLog.log;hadoop.home.dir=C:/winutils"

View solution in original post

14 REPLIES 14

avatar
Expert Contributor

I try and agent start correctly.

Now i used cluster stored on azure.

I download jar file and hdfs conf client from cluster (put on /conf dir of flume).

When sink start have this error:

(SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:455)] HDFS IO error java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

I try to add this part on core-site.xml (resolve this error on sandbox)

<property>

<name>fs.file.impl</name>

<value>org.apache.hadoop.fs.LocalFileSystem</value>

<description>The FileSystem for file: uris.</description>

</property>

<property>

<name>fs.hdfs.impl</name>

<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>

<description>The FileSystem for hdfs: uris.</description>

</property>

But have this error:

2016-03-17 16:25:29,380 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 16 more

2016-03-17 16:25:29,429 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows. org.apache.flume.EventDeliveryException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:463) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 16 more

Thanks for help

avatar
Super Collaborator

Use jar files from your azure cluster, not sandbox. You need exactly same versions of libs used on azure cluster.

Also copy core-site.xml to flume classpath (FLUME_HOME/conf should be fine)

Regards

avatar
Expert Contributor

I copy jar from azure :) btw now I erase all and restart config, hope find the problem

avatar
Expert Contributor

nothing. I reinstall all, but have same error (No FileSystem for scheme: hdfs)

list of jar on flume windows:

apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
asm-3.2.jar
avro-1.7.4.jar
aws-java-sdk-1.7.4.jar
azure-storage-2.2.0.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.4.jar
commons-collections-3.2.2.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
flume-avro-source-1.6.0.jar
flume-dataset-sink-1.6.0.jar
flume-file-channel-1.6.0.jar
flume-hdfs-sink-1.6.0.jar
flume-hive-sink-1.6.0.jar
flume-irc-sink-1.6.0.jar
flume-jdbc-channel-1.6.0.jar
flume-jms-source-1.6.0.jar
flume-kafka-channel-1.6.0.jar
flume-kafka-source-1.6.0.jar
flume-ng-auth-1.6.0.jar
flume-ng-configuration-1.6.0.jar
flume-ng-core-1.6.0.jar
flume-ng-elasticsearch-sink-1.6.0.jar
flume-ng-embedded-agent-1.6.0.jar
flume-ng-hbase-sink-1.6.0.jar
flume-ng-kafka-sink-1.6.0.jar
flume-ng-log4jappender-1.6.0.jar
flume-ng-morphline-solr-sink-1.6.0.jar
flume-ng-node-1.6.0.jar
flume-ng-sdk-1.6.0.jar
flume-scribe-source-1.6.0.jar
flume-spillable-memory-channel-1.6.0.jar
flume-thrift-source-1.6.0.jar
flume-tools-1.6.0.jar
flume-twitter-source-1.6.0.jar
gson-2.2.4.jar
guava-11.0.2.jar
hadoop-annotations-2.7.1.2.3.4.0-3485.jar
hadoop-auth-2.7.1.2.3.4.0-3485.jar
hadoop-aws-2.7.1.2.3.4.0-3485.jar
hadoop-azure-2.7.1.2.3.4.0-3485.jar
hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
hadoop-common-2.7.1.2.3.4.0-3485.jar
hadoop-nfs-2.7.1.2.3.4.0-3485.jar
hamcrest-core-1.3.jar
htrace-core-3.1.0-incubating.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
jackson-annotations-2.2.3.jar
jackson-core-2.2.3.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.2.3.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-xc-1.9.13.jar
java-xmlbuilder-0.4.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jersey-core-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.hwx.jar
jetty-util-6.1.26.hwx.jar
jsch-0.1.42.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
junit-4.11.jar
log4j-1.2.17.jar
microsoft-windowsazure-storage-sdk-0.6.0.jar
mockito-all-1.8.5.jar
netty-3.6.2.Final.jar
paranamer-2.3.jar
protobuf-java-2.5.0.jar
ranger-hdfs-plugin-shim-0.5.0.2.3.4.0-3485.jar
ranger-plugin-classloader-0.5.0.2.3.4.0-3485.jar
ranger-yarn-plugin-shim-0.5.0.2.3.4.0-3485.jar
servlet-api-2.5.jar
slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar
snappy-java-1.0.4.1.jar
stax-api-1.0-2.jar
xmlenc-0.52.jar
xz-1.0.jar
zookeeper-3.4.6.2.3.4.0-3485.jar

List of jar of cluster on azure:

activation-1.1.jar
apacheds-i18n-2.0.0-M15.jar
apacheds-kerberos-codec-2.0.0-M15.jar
api-asn1-api-1.0.0-M20.jar
api-util-1.0.0-M20.jar
asm-3.2.jar
avro-1.7.4.jar
aws-java-sdk-1.7.4.jar
azure-storage-2.2.0.jar
commons-beanutils-1.7.0.jar
commons-beanutils-core-1.8.0.jar
commons-cli-1.2.jar
commons-codec-1.4.jar
commons-collections-3.2.2.jar
commons-compress-1.4.1.jar
commons-configuration-1.6.jar
commons-digester-1.8.jar
commons-httpclient-3.1.jar
commons-io-2.4.jar
commons-lang-2.6.jar
commons-logging-1.1.3.jar
commons-math3-3.1.1.jar
commons-net-3.1.jar
curator-client-2.7.1.jar
curator-framework-2.7.1.jar
curator-recipes-2.7.1.jar
gson-2.2.4.jar
guava-11.0.2.jar
hamcrest-core-1.3.jar
htrace-core-3.1.0-incubating.jar
httpclient-4.2.5.jar
httpcore-4.2.5.jar
jackson-annotations-2.2.3.jar
jackson-core-2.2.3.jar
jackson-core-asl-1.9.13.jar
jackson-databind-2.2.3.jar
jackson-jaxrs-1.9.13.jar
jackson-mapper-asl-1.9.13.jar
jackson-xc-1.9.13.jar
java-xmlbuilder-0.4.jar
jaxb-api-2.2.2.jar
jaxb-impl-2.2.3-1.jar
jersey-core-1.9.jar
jersey-json-1.9.jar
jersey-server-1.9.jar
jets3t-0.9.0.jar
jettison-1.1.jar
jetty-6.1.26.hwx.jar
jetty-util-6.1.26.hwx.jar
jsch-0.1.42.jar
jsp-api-2.1.jar
jsr305-3.0.0.jar
junit-4.11.jar
log4j-1.2.17.jar
microsoft-windowsazure-storage-sdk-0.6.0.jar
mockito-all-1.8.5.jar
native
netty-3.6.2.Final.jar
ojdbc6.jar
paranamer-2.3.jar
protobuf-java-2.5.0.jar
ranger-hdfs-plugin-impl
ranger-hdfs-plugin-shim-0.5.0.2.3.4.0-3485.jar
ranger-plugin-classloader-0.5.0.2.3.4.0-3485.jar
ranger-yarn-plugin-impl
ranger-yarn-plugin-shim-0.5.0.2.3.4.0-3485.jar
servlet-api-2.5.jar
slf4j-api-1.7.10.jar
slf4j-log4j12-1.7.10.jar
snappy-java-1.0.4.1.jar
stax-api-1.0-2.jar
xmlenc-0.52.jar
xz-1.0.jar
zookeeper-3.4.6.2.3.4.0-3485.jar

for last attempt I put on flume/lib directory all jar of hadoop cluster. No have idea what is the problem. Thanks for patience and help.

avatar
Super Collaborator
hadoop-annotations-2.7.1.2.3.4.0-3485.jar
hadoop-auth-2.7.1.2.3.4.0-3485.jar
hadoop-aws-2.7.1.2.3.4.0-3485.jar
hadoop-azure-2.7.1.2.3.4.0-3485.jar
hadoop-common-2.7.1.2.3.4.0-3485-tests.jar
hadoop-common-2.7.1.2.3.4.0-3485.jar
hadoop-nfs-2.7.1.2.3.4.0-3485.jar

Double check it's a classes from Azure.. also you need to add hadoop-hdfs.jar and core-site.xml