What are the various ways to integrate Apache Pig, Nifi and Spark?
I know I can connect some with Kafka or via files.
There are mutilple ways to integrate these 3 services. As a starting point Nifi will probably be your ingestion flow. During this flow you could
- put your data to kafka and have spark read from it
- push your nifi data to spark: https://blogs.apache.org/nifi/entry/stream_processing_nifi_and_spark
- you could use and execute script processor and start a pig job
In summary you can have a push and forget connection, you can have a push to service and pick in next flow approach, or even execute in processor as corner case maybe
hope this shares some insight
View solution in original post