Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nifi/Dataflow: PutHDFS to Azure Data Lake?

Highlighted

Nifi/Dataflow: PutHDFS to Azure Data Lake?

New Contributor

Hello,

I am totally new to Nifi, bear that in mind :)

I would like to put local a file into Azure Data Lake. Is that at all possible using Nifi / Dataflow?

In the same vein (I guess), does Nifi always have to run on the same Hadoop-cluster that you want to run PutHDFS on, or can you connect to a remote Hadoop cluster from the machine running Nifi ?

Thanks in advance,

Erik

6 REPLIES 6

Re: Nifi/Dataflow: PutHDFS to Azure Data Lake?

Erik,

You can use Nifi for moving data from local to Azure without problem. You can find a simple workflow describing how to at the following URL

https://community.hortonworks.com/articles/7999/apache-nifi-part-1-introduction.html

You don't need to run Nifi on a Hadoop cluster. Most of the times, it runs as a standalone solution ( clustered for HA purposes ) and it will connect to any sources / destinations.

Highlighted

Re: Nifi/Dataflow: PutHDFS to Azure Data Lake?

Hi @Erik Flateby,

Regarding sending data in Azure Data Lake, it is a work in progress (https://issues.apache.org/jira/browse/NIFI-1833). There is a PR waiting for some refactoring and reviews, it will probably be available in next versions.

Regarding PutHDFS, there is no need to be co-located on the Hadoop cluster, you just need to have configuration files available (core-site and hdfs-site if I remember correctly).

Hope this helps.

Highlighted

Re: Nifi/Dataflow: PutHDFS to Azure Data Lake?

New Contributor

Just a point to clarify since I think the intent of @Erik Flateby is probably to use NiFi to push data to an HDInsight instance (I'm assuming since he asked about Azure Data Lake previously).

For that use case Nifi WILL need to be a node in the same cluster (you will need to put Nifi on an edge node ideally, or another node). This is due to the implementation of the custom Hadoop Filesystem that stores to Azure Blob Storage (wasb) or Azure Data Lake - it references Python scripts that must exist on your Nifi node. These scripts get installed as part of any node in the HDInsight cluster.

Highlighted

Re: Nifi/Dataflow: PutHDFS to Azure Data Lake?

New Contributor

Hi @Pierre Villard and @Olivier Renault

Thank you for quick replies!

The Jira issue referred to is for Azure Blob Storage, which will be good to have also.

Azure Data Lake, https://azure.microsoft.com/en-us/solutions/data-lake/, can be thought of HDFS-as-a-service, among other things. The question is whether Nifi can be pointed to write to this, or if it requires a regular cluster?

Also, do you know if Dataflow/Nifi is included with Hortonworks HDinsight clusters in Azure (found no answer to this on Google, Hortonworks.com, chat on Hortonworks.com or the Azure portal)?

BR,

Erik

Highlighted

Re: Nifi/Dataflow: PutHDFS to Azure Data Lake?

@Erik Flateby please see this Github project regarding NiFi and ADL integration.

Please see this HCC post regarding using HDF on HDInsight (it's not included out of the box).

Highlighted

Re: Nifi/Dataflow: PutHDFS to Azure Data Lake?

Don't have an account?
Coming from Hortonworks? Activate your account here