Created on 07-18-2021 06:29 AM - edited 07-18-2021 06:49 AM
Hi.
I have three .log files.
The first one is general and has the form:
2223411 Subject Comment Address
2243561 Subject Comment Address
The other two have the form:
2223411 Some string
2243561 Some string
I need to take the first file and attach the remaining fields to each file by the first field, which is the id. It should turn out like this:
2223411 Some string Subject Comment Address
2243561 Some string Subject Comment Address
That is, in fact, add everything except the id to each line of the other files by id.
Please tell me if I need to use script processors and write scripts for this. Or can I do this with existing processors correctly?
Created 07-22-2021 08:02 AM
I believe that what you are looking for is called a 'streaming join'.
I won't say it is impossible, but this is not something Nifi is made for and there is no good way to do it.
Perhaps look into other solutions available in the Cloudera offering, such as Flink or Spark Streaming.
Created 07-22-2021 08:39 AM
Thank you, in the end I did it using a groovy script.
But I still didn't understand why when I make a regular request in the http Invoke processor, I see a lot of duplicate files. We had to use the processor to remove duplicates. I didn't quite understand then what the nifi is for?
 
					
				
				
			
		
