Member since
03-30-2024
2
Posts
2
Kudos Received
0
Solutions
04-01-2024
06:35 AM
1 Kudo
Hi Vidya, thanks a lot. I will try to explain my problem a bit more in details. I'm importing data from an event based system. I'm using ListS3 and FetchS3Object to download parquet files from an AWS S3 bucket. In this bucket every entity has a separate directory which then is further split up by date of update of the entity. I'm using RouteOnAttribute to then route the data into the corresponding table of a MySQL database. The parquet files include updated records of the entites but it's not just the changes but the latest state of the entity so that I could ignore previous updates if it happens that I process a newer update before some older ones. The files on the bucket have some random name. It seems that ListS3 uses some alphabetical order and I also didn't see any way to order the files corresponding to the changed time of the S3 bucket. Every record contains a unique id of the entity and a timestamp which indicates the last update of the entity. Some of the entities also include a version number that I could use additionally or also instead of the timestamp. To put the data into the MySQL database so far I'm using PutDatabaseRecord with statement type UPSERT. My plan was to check for the latest update timestamp that is stored in the MySQL database. If no entry was found perform an insert, if an entry was found and the timestamp is older or the version number is lower than the one that is currently being processed then I would perform an update. If the entry in the database is already newer I would just skip this record. br, Stefan
... View more