Member since
06-16-2016
43
Posts
22
Kudos Received
0
Solutions
08-08-2017
12:49 AM
3 Kudos
This article tries to compare the data recovery period of accidentally deleted data in HDFS. We would compare two scenarios,
1. When trash is enabled.
2. When snapshot is enabled.
Data Recovery from trash:
When a data from HDFS is deleted, metadata in HDFS is updated to delete the file from the source folder. However, the blocks from the datanode is not immediately deleted. The trash folder in HDFS is updated with the file along with the directory from where it is deleted in the user's .trash folder. The deleted data could be recovered from the trash folder.
Example: 1. Existing data in HDFS. #hadoop fs -ls /tmp/test1.txt
-rw-r--r-- 3 hdfs hdfs 4 2017-08-07 23:47 /tmp/test1.txt
2. Deleted data in HDFS. #hadoop fs -rm /tmp/test1.txt
17/08/07 23:52:13 INFO fs.TrashPolicyDefault: Moved: 'hdfs://vnn/tmp/test1.txt' to trash at: hdfs://vnn/user/hdfs/.Trash/Current/tmp/test1.txt
3. Recovering a deleted data #hadoop fs -cp /user/hdfs/.Trash/Current/tmp/test1.txt /tmp/
#hadoop fs -ls /tmp/test1.txt
-rw-r--r-- 3 hdfs hdfs 4 2017-08-07 23:57 /tmp/test1.txt
Data recovery from snapshots: Snapshots are read-only point in time copies of HDFS file system. Enable a directory to be snapshot-able to recovery any accidental data loss. 1. Enabling snapshot. #hdfs dfsadmin -allowSnapshot /tmp/snapshotdir
Allowing snaphot on /tmp/snapshotdir succeeded
2. Create snapshot for a directory. #hdfs dfs -createSnapshot /tmp/snapshotdir
Created snapshot /tmp/snapshotdir/.snapshot/s20170807-180139.568 3. Contents of HDFS snapshot based folder. #hdfs dfs -ls /tmp/snapshotdir/
Found 3 items
hadoop fs -rm $1
-rw-r--r-- 3 hdfs hdfs 1083492818 2017-07-31 19:01 /tmp/snapshotdir/oneGB.csv
-rw-r--r-- 3 hdfs hdfs 10722068505 2017-08-02 17:19 /tmp/snapshotdir/tenGB.csv
#hdfs dfs -ls /tmp/snapshotdir/.snapshot/s20170807-180139.568
Found 3 items
-rw-r--r-- 3 hdfs hdfs 1083492818 2017-07-31 19:01 /tmp/snapshotdir/.snapshot/s20170807-180139.568/oneGB.csv
-rw-r--r-- 3 hdfs hdfs 10722068505 2017-08-02 17:19 /tmp/snapshotdir/.snapshot/s20170807-180139.568/tenGB.csv
4. Delete and recovering lost data. #hadoop fs -rm /tmp/snapshotdir/oneGB.csv
17/08/07 19:37:46 INFO fs.TrashPolicyDefault: Moved: 'hdfs://vinodnn/tmp/snapshotdir/oneGB.csv' to trash at: hdfs://vinodnn/user/hdfs/.Trash/Current/tmp/snapshotdir/oneGB.csv1502134666492
#hadoop fs -cp /tmp/snapshotdir/.snapshot/s20170807-180139.568/oneGB.csv /tmp/snapshotdir/
It is seen in the above methods that hadoop copy "hadoop fs -cp <source> <dest>" is used to recover the data. However, the time taken by "cp" operation would increase as the size of the lost data increases. One of the optimizations would be to use the move command, "hadoop fs -mv <source> <destination>" in place of copy operation, as former operation fairs better over latter. Since, snapshot folders are read-only, the only supported operation is "copy" ( but not move ). Following are the metrics that are used to compare the performance of "copy" operation over "move" for one GB and ten GB data file. Time to recover a file using copy (cp) operations: screen-shot-2017-08-07-at-60552-pm.png Time to recover a file using move (mv) operations: screen-shot-2017-08-07-at-60602-pm.png Hence, we observe that recovery of data using trash along with move operation is efficient in certain cases to tackle accidental data loss and recovery. NOTE: Recovering the data from trash would be possible if trash interval (fs.trash.interval) are properly configured to give Hadoop admins enough time to detect the data loss and recover it. If not, snapshot would be recommended for eventual recovery.
... View more
Labels:
05-28-2017
09:39 AM
1 Kudo
Hello @kkanchu, The 'Test Connection' error and stack trace that you are getting is because RANGER-1342 which got fixed recently. This should be available in HDP 2.6 (your question doesn't mention which HDP you are using). Nevertheless, you should still be able to add another repo and use it despite this error. Just that your auto complete of HDFS path won't work (as hinted in the error). For errors while adding service / repo, please check xa_portal.log for any other stack trace. Hope this helps ! PS - There is no ranger_admin.log, that message was referring to xa_portal.log only.
... View more
05-26-2017
12:58 AM
1 Kudo
ISSUE: While configuring NFS mounts to access HDFS as a part of local FS, we do tend to control the access using nfs proxies as shown below, <property>
<name>hadoop.proxyuser.nfsserver.groups</name>
<value>nfs-users1,nfs-users2</value>
<description>
The 'nfsserver' user is allowed to proxy all members of the
'nfs-users1' and 'nfs-users2' groups. Set this to '*' to allow
nfsserver user to proxy any group.
</description>
</property>
<property>
<name>hadoop.proxyuser.nfsserver.hosts</name>
<value>nfs-client-host1.com</value>
<description>
This is the host where the nfs gateway is running. Set this to
'*' to allow requests from any hosts to be proxied.
</description>
</property> However, a user who has access to NFS server would be able to access (view) the HDFS file system even if they are not part of "hadoop.proxyuser.nfsserver.groups" and "hadoop.proxyuser.nfsserver.hosts" . This may be a security flaw in certain scenarios, ROOT CAUSE: This is due to a property, "nfs.exports.allowed.hosts" which is used to allow the access to the HDFS from the hosts. RESOLUTION: Make sure the desired hosts and permissions are assigned to HDFS. Permissions for the property can be defined as below, <property>
<name>nfs.exports.allowed.hosts</name>
<value>* rw</value>
</property> NOTE: NFS gateway restart may be needed if the property is altered Links: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html#Allow_mounts_from_unprivileged_clients
... View more
Labels:
05-22-2017
06:52 PM
1 Kudo
Apart from checking the topologies from a Storm WebUI, we can also list the active topologies from one of the cluster nodes, We can use the following command to list the same, /usr/hdp/<HDP-version>/storm/bin/storm list If there are no topologies running, we would get an output as follows, No topologies running.
... View more
Labels:
04-07-2017
10:54 PM
1 Kudo
ISSUE: Java Heap Space issue in Hive MR engine
While working on a sample data set in hive. Query such as "select count(*)" was seen to fail with below error. Starting Job = job_1491603076412_0001, Tracking URL = http://krishna3.openstacklocal:8088/proxy/application_1491603076412_0001/
Kill Command = /usr/hdp/2.4.2.0-258/hadoop/bin/hadoop job -kill job_1491603076412_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-04-07 22:18:09,736 Stage-1 map = 0%, reduce = 0%
2017-04-07 22:18:46,065 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1491603076412_0001 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1491603076412_0001_m_000000 (and more) from job job_1491603076412_0001
Task with the most failures(4):
-----
Task ID:
task_1491603076412_0001_m_000000
URL:
http://krishna3.openstacklocal:8088/taskdetails.jsp?jobid=job_1491603076412_0001&tipid=task_1491603076412_0001_m_000000
-----
Diagnostic Messages for this Task:
Error: Java heap space
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Checking the corresponding application logs, we observe that 2017-04-07 22:25:40,828 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:986)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:442)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) ROOT CAUSE: Insufficient HeapSpace in MR Engine for mapreduce.map.memory.mb RESOLUTION: Increasing mapreduce.map.memory.mb from 1.2G to 1.75G and hence increasing the mapreduce.task.io.sort.mb to 1003 mapreduce.map.java.opts to -Xmx1433m and restarting the necessary services did resolve the problem. (NOTE: mapreduce.task.io.sort.mb and mapreduce.map.java.opts value recommendations were made by Ambari )
... View more
Labels:
03-30-2017
09:51 PM
@mqureshi thank you for the reply. Yes, "mapreduce.cluster.temp.dir" seems to be legacy property which is no longer available in 2.x.
... View more
04-25-2019
04:52 AM
I did kafka-server-stop.sh and zkServer.sh stop and then restarted it using zkServer.sh start and kafka-server-start.sh and it worked for me.
... View more
02-16-2017
02:58 AM
ISSUE:
We did see that Knox gateway servers failing to return Load balancer url after submitting a WebHDFS commands.
It could be seen that the logs in the server as below, 2017-02-15 20:15:51,050 DEBUG hadoop.gateway (UrlRewriteProcessor.java:rewrite(157)) - Rewrote URL: http://<Gateway-server-hostname>:50075/webhdfs/v1/user/<username>/test2?op=CREATE&user.name=<username>&namenoderpcaddress=<NN-server-hostname>:8020&overwrite=false, direction: OUT via explicit rule: WEBHDFS/webhdfs/outbound/namenode/headers/location to URL: https://<Gateway-server-hostname>:8443/gateway/production/webhdfs/data/v1/webhdfs/v1/user/<username>/test2?_=AAAACAAAABAAAACwXrr5dePzjWo4CD7w6g—lwAqK25Z-yUGo9MJf3qOHlOPn-oZzMWN3qF17Me78ia7H00bVqhPCLVCZNEbeoRY9Sct1cEkfqtmuqyWnj5LI68GDrc7iKr9loQheBkXuceCE4nf-9zXLqE-m8CVtdQQyxMSQnxcZMaAIPesoLoWDWOVXAgGLzWeVMs—uafWTPORGv5KRqok61gSU_KtCt4_9Igcoa1RrpnFEDtusyUHD9osMP612VJdu4ggzJJaVtLdg_btQhxw20 CAUSE:
It is seen that corrupted deployment director (/var/lib/knox/data/deployment) can lead to this issue. RESOLUTION: 1. Take a backup of the deployment directory 2. Delete the directory contents 3. Restart Knox server (recreates the contents).
... View more
Labels:
01-10-2017
10:39 PM
1 Kudo
Nope - compaction retains old files until all readers are out.
The implementation has ACID isolation levels - you could start a query and then do "delete from table" & still get no errors on the query which is already in motion.
... View more
- « Previous
-
- 1
- 2
- Next »