Created 03-14-2016 11:19 AM
Hi to all,
I need to install the flume-agent (1.5) on a Windows environment, to collect the logs and bring them on a hdp cluster on azure.
Can I only configure the agent or need complete installation of flume?
There is a complete guide with all steps of all installation / configuration?
I searched on web but could not find a complete guide.
Thank you
Created 03-15-2016 08:38 PM
I can propose much easier steps:
1. Download flume binaries - http://flume.apache.org/download.html and extract it somewhere (this is going to be a FLUME_HOME)
2. Download winutils and put it somwhere (f.e. C:/winutils/bin, in this case C:/winutils is going to be a HADOOP_HOME)
3. Copy all missed hdfs libs to your FLUME_HOME/lib (you can find them in your hadoop cluster, is always preferable to have exact the same versions as in /usr/hdp/current/hadoop or /usr/hdp/current/hadoop-hdfs)
4. Run flume agent with the following command:
bin\flume-ng agent -name MyAgent -f conf/MyAgent.properties -property "flume.root.logger=INFO,LOGFILE,console;flume.log.file=MyLog.log;hadoop.home.dir=C:/winutils"
Created 03-17-2016 03:35 PM
I try and agent start correctly.
Now i used cluster stored on azure.
I download jar file and hdfs conf client from cluster (put on /conf dir of flume).
When sink start have this error:
(SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:455)] HDFS IO error java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
I try to add this part on core-site.xml (resolve this error on sandbox)
<property>
<name>fs.file.impl</name>
<value>org.apache.hadoop.fs.LocalFileSystem</value>
<description>The FileSystem for file: uris.</description>
</property>
<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>
But have this error:
2016-03-17 16:25:29,380 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 16 more
2016-03-17 16:25:29,429 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows. org.apache.flume.EventDeliveryException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:463) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2638) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243) at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235) at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679) at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50) at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193) ... 16 more
Thanks for help
Created 03-17-2016 05:27 PM
Use jar files from your azure cluster, not sandbox. You need exactly same versions of libs used on azure cluster.
Also copy core-site.xml to flume classpath (FLUME_HOME/conf should be fine)
Regards
Created 03-17-2016 05:34 PM
I copy jar from azure :) btw now I erase all and restart config, hope find the problem
Created 03-18-2016 08:40 AM
nothing. I reinstall all, but have same error (No FileSystem for scheme: hdfs)
list of jar on flume windows:
apacheds-i18n-2.0.0-M15.jar apacheds-kerberos-codec-2.0.0-M15.jar api-asn1-api-1.0.0-M20.jar api-util-1.0.0-M20.jar asm-3.2.jar avro-1.7.4.jar aws-java-sdk-1.7.4.jar azure-storage-2.2.0.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.2.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-httpclient-3.1.jar commons-io-2.4.jar commons-lang-2.6.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.1.jar curator-client-2.7.1.jar curator-framework-2.7.1.jar curator-recipes-2.7.1.jar flume-avro-source-1.6.0.jar flume-dataset-sink-1.6.0.jar flume-file-channel-1.6.0.jar flume-hdfs-sink-1.6.0.jar flume-hive-sink-1.6.0.jar flume-irc-sink-1.6.0.jar flume-jdbc-channel-1.6.0.jar flume-jms-source-1.6.0.jar flume-kafka-channel-1.6.0.jar flume-kafka-source-1.6.0.jar flume-ng-auth-1.6.0.jar flume-ng-configuration-1.6.0.jar flume-ng-core-1.6.0.jar flume-ng-elasticsearch-sink-1.6.0.jar flume-ng-embedded-agent-1.6.0.jar flume-ng-hbase-sink-1.6.0.jar flume-ng-kafka-sink-1.6.0.jar flume-ng-log4jappender-1.6.0.jar flume-ng-morphline-solr-sink-1.6.0.jar flume-ng-node-1.6.0.jar flume-ng-sdk-1.6.0.jar flume-scribe-source-1.6.0.jar flume-spillable-memory-channel-1.6.0.jar flume-thrift-source-1.6.0.jar flume-tools-1.6.0.jar flume-twitter-source-1.6.0.jar gson-2.2.4.jar guava-11.0.2.jar hadoop-annotations-2.7.1.2.3.4.0-3485.jar hadoop-auth-2.7.1.2.3.4.0-3485.jar hadoop-aws-2.7.1.2.3.4.0-3485.jar hadoop-azure-2.7.1.2.3.4.0-3485.jar hadoop-common-2.7.1.2.3.4.0-3485-tests.jar hadoop-common-2.7.1.2.3.4.0-3485.jar hadoop-nfs-2.7.1.2.3.4.0-3485.jar hamcrest-core-1.3.jar htrace-core-3.1.0-incubating.jar httpclient-4.2.5.jar httpcore-4.2.5.jar jackson-annotations-2.2.3.jar jackson-core-2.2.3.jar jackson-core-asl-1.9.13.jar jackson-databind-2.2.3.jar jackson-jaxrs-1.9.13.jar jackson-mapper-asl-1.9.13.jar jackson-xc-1.9.13.jar java-xmlbuilder-0.4.jar jaxb-api-2.2.2.jar jaxb-impl-2.2.3-1.jar jersey-core-1.9.jar jersey-json-1.9.jar jersey-server-1.9.jar jets3t-0.9.0.jar jettison-1.1.jar jetty-6.1.26.hwx.jar jetty-util-6.1.26.hwx.jar jsch-0.1.42.jar jsp-api-2.1.jar jsr305-3.0.0.jar junit-4.11.jar log4j-1.2.17.jar microsoft-windowsazure-storage-sdk-0.6.0.jar mockito-all-1.8.5.jar netty-3.6.2.Final.jar paranamer-2.3.jar protobuf-java-2.5.0.jar ranger-hdfs-plugin-shim-0.5.0.2.3.4.0-3485.jar ranger-plugin-classloader-0.5.0.2.3.4.0-3485.jar ranger-yarn-plugin-shim-0.5.0.2.3.4.0-3485.jar servlet-api-2.5.jar slf4j-api-1.7.10.jar slf4j-log4j12-1.7.10.jar snappy-java-1.0.4.1.jar stax-api-1.0-2.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.6.2.3.4.0-3485.jar
List of jar of cluster on azure:
activation-1.1.jar apacheds-i18n-2.0.0-M15.jar apacheds-kerberos-codec-2.0.0-M15.jar api-asn1-api-1.0.0-M20.jar api-util-1.0.0-M20.jar asm-3.2.jar avro-1.7.4.jar aws-java-sdk-1.7.4.jar azure-storage-2.2.0.jar commons-beanutils-1.7.0.jar commons-beanutils-core-1.8.0.jar commons-cli-1.2.jar commons-codec-1.4.jar commons-collections-3.2.2.jar commons-compress-1.4.1.jar commons-configuration-1.6.jar commons-digester-1.8.jar commons-httpclient-3.1.jar commons-io-2.4.jar commons-lang-2.6.jar commons-logging-1.1.3.jar commons-math3-3.1.1.jar commons-net-3.1.jar curator-client-2.7.1.jar curator-framework-2.7.1.jar curator-recipes-2.7.1.jar gson-2.2.4.jar guava-11.0.2.jar hamcrest-core-1.3.jar htrace-core-3.1.0-incubating.jar httpclient-4.2.5.jar httpcore-4.2.5.jar jackson-annotations-2.2.3.jar jackson-core-2.2.3.jar jackson-core-asl-1.9.13.jar jackson-databind-2.2.3.jar jackson-jaxrs-1.9.13.jar jackson-mapper-asl-1.9.13.jar jackson-xc-1.9.13.jar java-xmlbuilder-0.4.jar jaxb-api-2.2.2.jar jaxb-impl-2.2.3-1.jar jersey-core-1.9.jar jersey-json-1.9.jar jersey-server-1.9.jar jets3t-0.9.0.jar jettison-1.1.jar jetty-6.1.26.hwx.jar jetty-util-6.1.26.hwx.jar jsch-0.1.42.jar jsp-api-2.1.jar jsr305-3.0.0.jar junit-4.11.jar log4j-1.2.17.jar microsoft-windowsazure-storage-sdk-0.6.0.jar mockito-all-1.8.5.jar native netty-3.6.2.Final.jar ojdbc6.jar paranamer-2.3.jar protobuf-java-2.5.0.jar ranger-hdfs-plugin-impl ranger-hdfs-plugin-shim-0.5.0.2.3.4.0-3485.jar ranger-plugin-classloader-0.5.0.2.3.4.0-3485.jar ranger-yarn-plugin-impl ranger-yarn-plugin-shim-0.5.0.2.3.4.0-3485.jar servlet-api-2.5.jar slf4j-api-1.7.10.jar slf4j-log4j12-1.7.10.jar snappy-java-1.0.4.1.jar stax-api-1.0-2.jar xmlenc-0.52.jar xz-1.0.jar zookeeper-3.4.6.2.3.4.0-3485.jar
for last attempt I put on flume/lib directory all jar of hadoop cluster. No have idea what is the problem. Thanks for patience and help.
Created 03-18-2016 10:25 AM
hadoop-annotations-2.7.1.2.3.4.0-3485.jar hadoop-auth-2.7.1.2.3.4.0-3485.jar hadoop-aws-2.7.1.2.3.4.0-3485.jar hadoop-azure-2.7.1.2.3.4.0-3485.jar hadoop-common-2.7.1.2.3.4.0-3485-tests.jar hadoop-common-2.7.1.2.3.4.0-3485.jar hadoop-nfs-2.7.1.2.3.4.0-3485.jar
Double check it's a classes from Azure.. also you need to add hadoop-hdfs.jar and core-site.xml