I am totally new to Nifi, bear that in mind :)
I would like to put local a file into Azure Data Lake. Is that at all possible using Nifi / Dataflow?
In the same vein (I guess), does Nifi always have to run on the same Hadoop-cluster that you want to run PutHDFS on, or can you connect to a remote Hadoop cluster from the machine running Nifi ?
Thanks in advance,
You can use Nifi for moving data from local to Azure without problem. You can find a simple workflow describing how to at the following URL
You don't need to run Nifi on a Hadoop cluster. Most of the times, it runs as a standalone solution ( clustered for HA purposes ) and it will connect to any sources / destinations.
Hi @Erik Flateby,
Regarding sending data in Azure Data Lake, it is a work in progress (https://issues.apache.org/jira/browse/NIFI-1833). There is a PR waiting for some refactoring and reviews, it will probably be available in next versions.
Regarding PutHDFS, there is no need to be co-located on the Hadoop cluster, you just need to have configuration files available (core-site and hdfs-site if I remember correctly).
Hope this helps.
Just a point to clarify since I think the intent of @Erik Flateby is probably to use NiFi to push data to an HDInsight instance (I'm assuming since he asked about Azure Data Lake previously).
For that use case Nifi WILL need to be a node in the same cluster (you will need to put Nifi on an edge node ideally, or another node). This is due to the implementation of the custom Hadoop Filesystem that stores to Azure Blob Storage (wasb) or Azure Data Lake - it references Python scripts that must exist on your Nifi node. These scripts get installed as part of any node in the HDInsight cluster.
Thank you for quick replies!
The Jira issue referred to is for Azure Blob Storage, which will be good to have also.
Azure Data Lake, https://azure.microsoft.com/en-us/solutions/data-lake/, can be thought of HDFS-as-a-service, among other things. The question is whether Nifi can be pointed to write to this, or if it requires a regular cluster?
Also, do you know if Dataflow/Nifi is included with Hortonworks HDinsight clusters in Azure (found no answer to this on Google, Hortonworks.com, chat on Hortonworks.com or the Azure portal)?