@kgautam Thanks you for your swift answer! Still... I changed the above loop to the pseudo code as below - and now all is working correctly. Small change to implement but I want to know for sure if I can rely on HDFS always serving me the latest greatest version of an updated file? hdfs dfs get myfile loop 15 times fetch myfile locally locally do something on myfile save myfile locally hdfs dfs put -f myfile // force put instead of delete & put -- this change alone didn't resolve the issue endloop delete local myfile FYI, I think HDFS differs alot from a local EXT3 disk for example - only 1 os and 1 diskcontroller are involved here! HDFS relies on multile os-installations, multiple disks per os AND at least one JAVA program to manage changes in the HDFS filesystem! For those wondering, I'm talking about HDP 2.4.
... View more
I wrote a commandline script that does the following in a loop for about 15 times on an edgenode (in pseudo code): loop 15 times: hdfs dfs get myfile locally do something on myfile hdfs dfs delete myfile hdfs dfs put myfile // an updated version of myfile on HDFS endloop At de put-statement the file wil be on one of the datanodes in the HDP-cluster. Replication will start later by itself. Can someone confirm my hypothesis that it can happen that HDFS serves an old "myfile" on the next get-statement from a different datanode than where myfile was put previously?
... View more