Created 05-20-2016 04:02 AM
How can I configure Nifi to connect to HDInsight?
I’m getting the error “No FilesSystem for scheme: wasb” when running Nifi PUT Hdfs command on a server attempting to connect to an HDInsight cluster.
I tried to add the Hadoop-azure.jar in Nifi’s class path but that caused a NoClassDefFoundError for apache hadoop fs filesystem.
Created 06-07-2016 04:53 AM
Ran into this issue on a recent project. The dependences have to be incorporated into the nar file - I've created a version incorporating the dependencies and submitted a pull request on the associated lira issue at https://issues.apache.org/jira/browse/NIFI-1922. Performed some basic testing on a mix of HDInsight clusters and it appears to work OK. Note though - you will need to implement NIFI on the cluster (due to the HDInsight/blob store security model) and will need to install Java 8.
Created 05-20-2016 09:57 AM
Could you provide a snippet from your nifi-app log with the stack trace for this error? I suspect the problem is that your hadoop-azure.jar is built against the wrong version of hadoop. What is the source of this file?
Created 05-20-2016 05:05 PM
I'm not totally sure if this is the problem, but given that NiFi has NARs with isolated class loading, adding something to the classpath usually isn't as simple as dropping it in the lib directory.
The hadoop libraries NAR would be unpacked to this location:
work/nar/extensions/nifi-hadoop-libraries-nar-<VERSION>.nar-unpacked/META-INF/bundled-dependencies/
You could trying putting the hadoop-azure.jar there, keeping in mind that if the work directory was removed, NiFi would unpack the original NAR again without your added jar.
Some have had success creating a custom version of the hadoop libraries NAR to switch to other libraries:
https://github.com/bbukacek/nifi-hadoop-libraries-bundle
Right now Apache NiFi is based on Apache Hadoop 2.6.2.
Created 05-20-2016 06:53 PM
Would dropping things on the bootstrap dir ensure they are on a system classpath maybe?
Created 05-25-2016 12:05 AM
It looks like the Hadoop-azure.jar is getting picked up ... but apparently there are other dependencies that are missing
Created 05-25-2016 12:04 AM
Hi Bryan, turns out after dropping the Hadoop-azure.jar in the nar directory ... I get a new error:
Caused by: java.lang.NoClassDefFoundError: com/microsoft/azure/storage/blob/BlobListingDetails at org.apache.hadoop.fs.azure.NativeAzureFileSystem.createDefaultStore(NativeAzureFileSystem.java:1064) ~[na:na] at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1035) ~[na:na] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596) ~[na:na] at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) ~[na:na] at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) ~[na:na] at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) ~[na:na] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) ~[na:na] at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) ~[na:na] at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:305) ~[na:na] at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:302) ~[na:na] at java.security.AccessController.doPrivileged(Native Method) ~[na:1.8.0_91] at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_91] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) ~[na:na] at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getFileSystemAsUser(AbstractHadoopProcessor.java:302) ~[na:na] at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.resetHDFSResources(AbstractHadoopProcessor.java:274) ~[na:na] at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.abstractOnScheduled(AbstractHadoopProcessor.java:196) ~[na:na] at org.apache.nifi.processors.hadoop.PutHDFS.onScheduled(PutHDFS.java:177) ~[na:na]
Created 05-25-2016 12:52 PM
This most likely means there is another JAR you need to add...
If you look at the pom file for the hadoop-azure JAR:
http://central.maven.org/maven2/org/apache/hadoop/hadoop-azure/2.7.0/hadoop-azure-2.7.0.pom
You can see all the dependencies it needs:
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <scope>compile</scope> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-core</artifactId> <scope>compile</scope> </dependency> <dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <scope>compile</scope> </dependency> <dependency> <groupId>com.microsoft.azure</groupId> <artifactId>azure-storage</artifactId> <scope>compile</scope> </dependency> <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <scope>compile</scope> </dependency>
My guess would be the azure-storage JAR is missing.
This becomes a slippery slope though, because then azure-storage might have transitive dependencies as well.
Created 05-24-2016 11:58 PM
2016-05-24 23:53:52,326 WARN [StandardProcessScheduler Thread-1] org.apache.hadoop.util.NativeCodeLoader Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2016-05-24 23:53:52,498 INFO [Flow Service Tasks Thread-1] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@5c6cae55 // Another save pending = false 2016-05-24 23:53:52,767 ERROR [StandardProcessScheduler Thread-1] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=0a8eeb51-4937-4dfc-a7f4-9c2bce921d0c] PutHDFS[id=0a8eeb51-4937-4dfc-a7f4-9c2bce921d0c] failed to invoke @OnScheduled method due to java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.; processor will not be scheduled to run for 30000 milliseconds: java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task. 2016-05-24 23:53:52,785 ERROR [StandardProcessScheduler Thread-1] o.apache.nifi.processors.hadoop.PutHDFS java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task. at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1405) ~[na:na] at org.apache.nifi.controller.StandardProcessorNode.access$100(StandardProcessorNode.java:89) ~[na:na] at org.apache.nifi.controller.StandardProcessorNode$1.run(StandardProcessorNode.java:1243) ~[na:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_91] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_91] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] Caused by: java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_91] at java.util.concurrent.FutureTask.get(FutureTask.java:206) [na:1.8.0_91] at org.apache.nifi.controller.StandardProcessorNode.invokeTaskAsCancelableFuture(StandardProcessorNode.java:1388) ~[na:na] ... 9 common frames omitted Caused by: java.lang.reflect.InvocationTargetException: null at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_91] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_91] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_91] at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_91] at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137) ~[na:na] at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125) ~[na:na] at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70) ~[na:na] at org.apache.nifi.controller.StandardProcessorNode$1$1.call(StandardProcessorNode.java:1247) ~[na:na] at org.apache.nifi.controller.StandardProcessorNode$1$1.call(StandardProcessorNode.java:1243) ~[na:na] ... 6 common frames omitted Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FileSystem ...
Created 06-07-2016 04:53 AM
Ran into this issue on a recent project. The dependences have to be incorporated into the nar file - I've created a version incorporating the dependencies and submitted a pull request on the associated lira issue at https://issues.apache.org/jira/browse/NIFI-1922. Performed some basic testing on a mix of HDInsight clusters and it appears to work OK. Note though - you will need to implement NIFI on the cluster (due to the HDInsight/blob store security model) and will need to install Java 8.
Created 06-09-2016 06:11 PM
Hi Alex... do you have detailed instructions on how to Build and Implement Nifi on the cluster?