Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to read my hdfs data into nifi cluster

avatar

i created two different clusters

1) two node nifi cluster 2) three node hadoop (HDP) cluster and i want to load my hdfs data into nifi cluster. so please explain what are the configurations we need to change and how to connect nifi and hadoop cluster.

1 ACCEPTED SOLUTION

avatar

Hi Kishore,

You could drop below processors as per your requirement on to your NiFi UI and configure them properly to pull the data from HDFS.

5229-screen-shot-2016-06-25-at-112358-am.png

and to configure them, add hdfs, core property files location saved on to the local file system on nifi node(s), can use Ambari to download these from HDP cluster. pls find sample conf screenshot below:

5230-screen-shot-2016-06-25-at-112633-am.png

Thanks!

View solution in original post

9 REPLIES 9

avatar

Hi Kishore,

You could drop below processors as per your requirement on to your NiFi UI and configure them properly to pull the data from HDFS.

5229-screen-shot-2016-06-25-at-112358-am.png

and to configure them, add hdfs, core property files location saved on to the local file system on nifi node(s), can use Ambari to download these from HDP cluster. pls find sample conf screenshot below:

5230-screen-shot-2016-06-25-at-112633-am.png

Thanks!

avatar

Hi jobin,

1) i installed NiFi outside Hadoop. so core-site.xml and hdfs-site.xml files i want to put in nifi bin folder or some other folder.and which directory i want to add in cofgure processor (like /nifi/bin/core-site.xml or /etc/hadoop/conf/core-site.xml).

2) How does it work when you have data to be streamed ? Should it go to the first NiFi node ? If I have 100 MB/s data how we share a load between these two systems ? Do you need a load balancer ?

please explain how to resolve this issues .

avatar
Master Guru

split your ingest into different NIFI nodes depending on their location. from there do some initial clean up, send it to kafka which can send to your remote hadoop cluster for landing in HDFS.

avatar

hi timothy ,

my nifi cluster is different and hadoop cluster is different so how to connect these two cluster and where to save xml file in nifi.

avatar

Hi @kishore sanchina,

You can save config files in any directory on nifi node, and provide that path in the processor configuration.

avatar

thanks jobin 🙂

avatar

How does it work when you have data to be streamed ? Should it go to the first NiFi node ? If I have 100 MB/s data how we share a load between these two systems ? Do you need a load balancer ?

avatar

How does it work when you have data to be streamed ? Should it go to the first NiFi node ? If I have 100 MB/s data how we share a load between these two systems ? Do you need a load balancer ?

avatar

Hi Kishore,

If its a cluster, You will be creating your flows in NCM[Nifi Cluster Manager] UI, which runs on all the nodes in the cluster. Since you have only 2 nodes in the cluster(may be only one worker node and NCM), you may not have much to load balance there. Still you can stimulate a load balancer with nifi site-to- site protocol.

you can get more info on site-to-site protocol load balancing here:

https://community.hortonworks.com/questions/509/site-to-site-protocol-load-balancing.html

Thanks!