Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Replacing GetFile with GetKafka.

Replacing GetFile with GetKafka.

New Contributor

In continuation to a question I asked last week…

Right now my (junior) team and I have a NiFi processor that is to filter various file typess based on their endings.

At Present...

  • We understand how to get/set information from a GetFile/PutFile Processor (as placeholders for a REST Service).
  • We understand (or do we?) basic producer and consumer topic configuration.
  • IE. we can push random strings into our kafka producer (named 'test') and watch it pop out of our consumer (also named 'test') through NiFi.

  • Core to the Current Issue: How to configure Kafka to ingest varieties of files into our architecture is unknown.
  • Some of us believe we may need to go into NiFi's GetKafka (API | Git) or ConsumeKafka (API | Git) backend and modify it (with code akin to GetFile) into a custom Kafka processor.

    To the Questions:

    • Would this truly be necessary and recommended? Or are we missing something?
    • Knowing that, in the long run, we'll be pulling from a REST into Kafka for a reason…my instincts question if modding GetKafka to reflect GetFile is what we're really after.
    • I may be wrong. Said instincts are still junior.

  • If customization via Java is recommended, where to start, given the APIs linked above?
    • What about our Kafka Configuration?
    • I'm not a SME on Kafka, but I'd also suspect that the configuration process of a producer topic(s) may also be involved in defining file ingestion locations/listeners. Then we'd somehow use the GetKafka processor that's already there to listen in to something…somewhere?
    • I may be wrong.

    I seek solution consultation. Thank you. Below is the NiFi Scenario.


    1 REPLY 1

    Re: Replacing GetFile with GetKafka.

    Taking NiFi out of the picture for a second, can you clarify what you want to achieve?

    You want to receive data over a REST service, make some decisions/transformations, and then send that data to a Kafka topic?

    Don't have an account?
    Coming from Hortonworks? Activate your account here