I'm new to HDIndight and trying to explore it. I have a scenario that need to get the realtime seonsor's data and analyze them.
I would like to use NiFi for the data ingestion and pass it Kafka and then Spark on HDInsight
Can you please advice how can I integrate Nifi with HDInsight ?
Thanks a lot,
@Sanaz Janbakhsh if you have a NiFi instance/cluster outside of Azure that's acquiring the data from the edge devices, you can stand up another NiFi instance/cluster within an Azure VNet and expose the HTTPS port, in order to connect to the Azure NiFi instance/cluster using NiFi's site-to-site protocol (which is tunneled over HTTPS in HDF 2.0). Once the flowfile has been transmitted to the NiFi instance/cluster living in Azure using a Remote Process Group, you can use a PutKafka processor to push the data to the Kafka topic from which your Spark job is reading.
Alternatively, you could use NiFi's PutAzureEventHub processor to push a flowfile from NiFi to the Azure Event Hub, where further orchestration can occur (using Azure Data Factory, NiFi, etc.).
It wasn't clear to me whether you were thinking about using Kafka for HDI (now in technical preview) or whether Kafka would be part of your infrastructure outside of Azure. I'd be happy to add further color and recommendations given further details about your use case and requirements.
I do have NiFi on promise, but now I would like to set up it inside Azure cluster. Is there any document that shows me the steps? Should I copy and extract the HDF on the new instance? Shall I use the same instance or I should launch a separate than HDInsight??
@Sanaz Janbakhsh you would install NiFi on Azure VMs, just as you installed it on-premise. This will be a separate NiFi instance/cluster which you can communicate with over the site-to-site protocol using a remote process group. Please see Vasilis's article https://community.hortonworks.com/content/kbentry/51517/hdf-on-hdi-nifi.html for some more information.