Support Questions

Find answers, ask questions, and share your expertise

data recovery methods in hadoop for /user directory

Master Collaborator

I have two questions :

1- is there any standard supported way of recovering the "/user" folder if its deleted with -skiptrace option ? since the .Trash folder for deleted files is also kept in /user folder.

2- I am following this unsupported way of recovering but I can't find the deleted directory in the mentioned file in my case.

[root@hadoop3 current]# pwd
[root@hadoop3 current]#  hdfs oev -i edits_inprogress_0000000000010978611 -o edits_inprogress_0000000000010978611.xml
[root@hadoop3 current]# grep "/user" edits_inprogress_0000000000010978611.xml
[root@hadoop3 current]#

Master Collaborator

any moderators listening ? why I am not getting a single reply ?

Master Collaborator

I have a very simple question : what if someone deletes the "/user" folder in production environment ? does it mean loosing your complete environment ? I doubt it and Hortonworks must have a solution for this.

Hi @Sami Ahmad, there is no reliable way to recover data once you have removed a directory with the -skipTrash option. Your best bet is to immediately stop the cluster as soon as you realize your mistake and then try the recovery steps from the linked article.

However even that won't work if the DataNodes have already deleted the block files (the delay between issuing the delete command to DataNodes deleting block files can be anywhere from a few seconds to a few minutes).

In your case, you don't see the transaction in the edits_inprogress_0000000000010978611 file likely because the edit logs have rolled over and the delete transaction is in an older edit log file.

If your cluster has been up and running since the last 4 days there is little hope of recovering the data now, unfortunately.

Master Collaborator

so anyone with privileges can wipe out the whole business ? being from the database world its hard to imagine a company would accept Hortonworks as their data solution if it was not fool proof when it comes to data integrity and security and since many large corporations are using Hortonworks in their production environment I doubt that this is the case . There must be some way to avoid this situation as people do mistakes and systems fails .

I would like to hear someone from Hortonworks about this issue please .

@Sami Ahmad if you have a support contract I recommend you reach out to our support team.

We have directory-level data protection that can be optionally enabled. However a determined privileged user can wipe user data.

Master Collaborator

ok then let me ask this , what do companies to do backup this directory ? do they backup more often than others to ensure loss of data is minimum ?