I have 3 node nifi cluster and 3 node Hadoop cluster if i want to interact with HDFS, is i have to copy the configuration files(hdfs-site.xml,core-site.xml) into 3 nifi nodes or only in nifi master node ?
Even though you create a flow in the NCM UI, it runs on each nifi cluster node. So the configuration files should be placed in all the cluster nodes under same directory for this to work. (not mandatory to have in NCM).
- you may see similar errors if you don't have it on all nodes and processor will be in invalid state.
Hi @Jobin George, I have a query regarding your answer. I have a 3 node NiFi cluster setup and a 3 node HDP setup. Though I faced the same issue accessing the UI from NCM, I did not get any error when I accessed it from a browser in the Hadoop Namenode.
I referenced the config files from inside the Namenode and data was transferred from NiFi to HDFS directory successfully.
It may not be a good approach to access NiFi from the Namenode in production, but for experimentation and learning purposes can you pls try the above and let me know if it utilizes all the NiFi nodes or is it running in a single node(which defeats the use of the cluster)
Also, if the above method does work, any suggestions to suit the production environment?
I do not have any issues @Jobin George, I am able to transfer data from NiFi into HDFS from a browser in the Namenode, by referencing the path of the configurations files inside Hadoop directory from Namenode.
I want to know if by this method, NiFi is able to run in a full clustered mode(since the config files are not copied to other NiFi nodes) or does it internally run as a single node setup.
Thank you for the suggestion @Jobin George, you are right. I have 4 machines, one Namenode, one NCM, and two Datanodes/NiFi nodes. I guess that is why I did not face the error. Apologies for the comments.
Is this setup of 4 machines with shared components a good approach, or do you suggest having separate machines for the NiFi nodes? I will not be dealing with too much overhead at the datanodes, only when there is a need for nightly model re-training and during model predictions (please also have a look at my query in your NiFi + Spark : Feeding Data to Spark Streaming thread)
Thanks for your time and patience :-)