I have a huge dataset (1 TB) with 9 billions of different ID and the script I developped is never ending.
A sample of my dataset for one ID :
ID day location1 location2 a 05/01 Rome Paris a 08/01 Zurich Amsterdam a 09/01 None Rome
What I whant:
a 05/01 Rome Paris a 06/01 Paris Paris a 07/01 Paris Paris a 08/01 Zurich Amsterdam a 09/01 Amsterdam Rome
As it is show in the exemple I need to add all the missing days for each user and to consider that the user is not moved during that days when I don't have any records.
Anyone has any idea to approch this problem in a efficient way?