Support Questions
Find answers, ask questions, and share your expertise

Comparing Data - which way to go - your opinion?

Comparing Data - which way to go - your opinion?

Explorer

Hello everyone,

 

i need your help. At the moment I have 3 folders - ip, url and domain. Each folder contains several hundred csv files with the following format (the timestamps are in numberic format. Changed it just for the example):

 

Csv1 - badip

ip, timestamp, vendor

192.158.1.38,today,badip

127.0.0.1,today,badip

178.12.2.27,yesterday,badip

 

csv2 - cyberteam

ip, timestamp, vendor

192.158.1.38,yesterday,cyberteam

168.11.2.26,yesterday,cyberteam

 

And what i need is to compare all the csv files in one folder. For example i need to read csv 1, take the first row and compare it to every file. If the row is matching i need to write the row and all matching rows in a seperate file and compare the timestamps. The first date gets an 1 the second a 2 ... At the end i need to check which vendor has so many rows with a one ...

 

One friendly guy in this forum gave me the tip to use spark or flink over nifi and I wanted to check if this is the best solution for this process (which is the better match?). Is there another tool you could recommend? At best would be a tool which is easy to setup.

 

Should i load everything in sql and compare it in sql?

 

I am really scared to go for the wrong tool and regret it afterwards so I am happy for every tipp you got.