I have two streams that are being published to two Kafka topics from NiFi and am trying to 'time synchronize' them so i can do some simple analytics on the streams.
The streams are time series data but the messages in these streams are generated at different times for example:
Stream 1 will look like this:
Time Value_1
0:00:01 10
0:00:05 20
0:00:10 30 .....
Stream 2 will look like this:
Time Value_2
0:00:01 100
0:00:02 200
0:00:03 300
0:00:04 400
0:00:05 500
0:00:06 600
0:00:07 700
0:00:08 800
0:00:09 900
0:00:10 1000
and so on
When I join these two streams I want something like this:
Time Value_1 Value_2
0:00:02 10 200
0:00:03 10 300
0:00:04 10 400
0:00:05 20 500
0:00:06 20 600
0:00:07 20 700
0:00:08 20 800
0:00:09 20 900
0:00:10 30 1000
I tried an inner join in SAM with window_interval 2 and sliding_interval 0 which gets close but what I get is this instead
Time Value_1 Value_2
0:00:01 10 200
0:00:05 20 400
0:00:10 30 1000
As you can see I am missing data in the middle that is needed for my analysis. This doesn't change if I change the order of the streams being joined.