I need your help because I do not know how to proceed.
Right now I have a PostgreSQL database with the following table:
domain, source, timestamp,
domainA, yourdomains, 128989372
domainB, yourdomaisn, 128923892
domainA, cyberclub, 13934829
domainD, cyberclub, 184994420
domainA, securityTeam, 118382938
My goal is to make some comparisons and alter the table. The most important one is to check every line for duplicates in the table in column "domain" like the first, the third and the last line and compare their timestamps. The one with the lowest timestamp gets a new column with the number 1. The next one gets 2 ... At the end I should see which source has how many ones.
Which tool should I use? I got Apache Flink or Spark recommended? Or another SQL Tool? Or plain SQL with scripts? I am happy for every tip!