We have a data flow wherein standardized transaction logs are generated as xml files in a particular directory in VM. We want to publish these logs to a Kafka topic. Following NiFi/MiNiFi flow is what we are using: ListFile ---> FetchFile (completion strategy - Move File) ---> PublishKafka Each transaction translates to single xml file, at peak hours, over a million files are getting generated per hour, i.e. approximately 300 files per second. Our goal is achieve the above flow using MiNiFi. ListFile uses two listing strategies, tracking timestamps and tracking entities. Initially data flow was created with by selecting tracking timestamps option. Here we observed that several files were not picked by NiFi / MiNiFi. The files that were picked were eventually moved to a different directory but some files were not picked. To put a number to this, 2% - 5% of files were not picked (thus not published to Kafka topic). The behavior was observed in both NiFi and corresponding YAML file in MiNiFi. We then tried using tracking entities option in ListFile's listing strategy, created DistributedMapCacheClientService in Entity Tracking State Cache property and configured DistributedMapCacheServer with default ports. This configuration worked in NiFi flow, we tested with by generating a million files in span of one hour and all file contents were published to Kafka topic. Then we attempted the same by converting NiFi flow to MiNiFi yaml and there it failed with errors like DistributedMapCacheClientService is unable to connect to localhost:4557 (default hostname and port for DistributedMapCacheServer). We tried to create controller service using REST API but that seems to work in NiFi but not in MiNiFi. So my question are, 1) Is there a way to configure and start DistributedMapCacheServer controller service in MiNiFi instance? 2) Is there way to host DistributedMapCacheServer separately (by running some command on its NAR file)? 3) If there exists a different approach to transfer file contents to Kafka without losing out any transaction files, kindly suggest the same.
... View more