About kkanchu

kkanchu · ‎08-08-2017

This article tries to compare the data recovery period of accidentally deleted data in HDFS. We would compare two scenarios, 1. When trash is enabled. 2. When snapshot is enabled. Data Recovery from trash: When a data from HDFS is deleted, metadata in HDFS is updated to delete the file from the source folder. However, the blocks from the datanode is not immediately deleted. The trash folder in HDFS is updated with the file along with the directory from where it is deleted in the user's .trash folder. The deleted data could be recovered from the trash folder. Example: 1. Existing data in HDFS. #hadoop fs -ls /tmp/test1.txt -rw-r--r-- 3 hdfs hdfs 4 2017-08-07 23:47 /tmp/test1.txt 2. Deleted data in HDFS. #hadoop fs -rm /tmp/test1.txt 17/08/07 23:52:13 INFO fs.TrashPolicyDefault: Moved: 'hdfs://vnn/tmp/test1.txt' to trash at: hdfs://vnn/user/hdfs/.Trash/Current/tmp/test1.txt 3. Recovering a deleted data #hadoop fs -cp /user/hdfs/.Trash/Current/tmp/test1.txt /tmp/ #hadoop fs -ls /tmp/test1.txt -rw-r--r-- 3 hdfs hdfs 4 2017-08-07 23:57 /tmp/test1.txt Data recovery from snapshots: Snapshots are read-only point in time copies of HDFS file system. Enable a directory to be snapshot-able to recovery any accidental data loss. 1. Enabling snapshot. #hdfs dfsadmin -allowSnapshot /tmp/snapshotdir Allowing snaphot on /tmp/snapshotdir succeeded 2. Create snapshot for a directory. #hdfs dfs -createSnapshot /tmp/snapshotdir Created snapshot /tmp/snapshotdir/.snapshot/s20170807-180139.568 3. Contents of HDFS snapshot based folder. #hdfs dfs -ls /tmp/snapshotdir/ Found 3 items hadoop fs -rm $1 -rw-r--r-- 3 hdfs hdfs 1083492818 2017-07-31 19:01 /tmp/snapshotdir/oneGB.csv -rw-r--r-- 3 hdfs hdfs 10722068505 2017-08-02 17:19 /tmp/snapshotdir/tenGB.csv #hdfs dfs -ls /tmp/snapshotdir/.snapshot/s20170807-180139.568 Found 3 items -rw-r--r-- 3 hdfs hdfs 1083492818 2017-07-31 19:01 /tmp/snapshotdir/.snapshot/s20170807-180139.568/oneGB.csv -rw-r--r-- 3 hdfs hdfs 10722068505 2017-08-02 17:19 /tmp/snapshotdir/.snapshot/s20170807-180139.568/tenGB.csv 4. Delete and recovering lost data. #hadoop fs -rm /tmp/snapshotdir/oneGB.csv 17/08/07 19:37:46 INFO fs.TrashPolicyDefault: Moved: 'hdfs://vinodnn/tmp/snapshotdir/oneGB.csv' to trash at: hdfs://vinodnn/user/hdfs/.Trash/Current/tmp/snapshotdir/oneGB.csv1502134666492 #hadoop fs -cp /tmp/snapshotdir/.snapshot/s20170807-180139.568/oneGB.csv /tmp/snapshotdir/ It is seen in the above methods that hadoop copy "hadoop fs -cp <source> <dest>" is used to recover the data. However, the time taken by "cp" operation would increase as the size of the lost data increases. One of the optimizations would be to use the move command, "hadoop fs -mv <source> <destination>" in place of copy operation, as former operation fairs better over latter. Since, snapshot folders are read-only, the only supported operation is "copy" ( but not move ). Following are the metrics that are used to compare the performance of "copy" operation over "move" for one GB and ten GB data file. Time to recover a file using copy (cp) operations: screen-shot-2017-08-07-at-60552-pm.png Time to recover a file using move (mv) operations: screen-shot-2017-08-07-at-60602-pm.png Hence, we observe that recovery of data using trash along with move operation is efficient in certain cases to tackle accidental data loss and recovery. NOTE: Recovering the data from trash would be possible if trash interval (fs.trash.interval) are properly configured to give Hadoop admins enough time to detect the data loss and recover it. If not, snapshot would be recommended for eventual recovery.

VR46 · ‎05-28-2017

Hello @kkanchu, The 'Test Connection' error and stack trace that you are getting is because RANGER-1342 which got fixed recently. This should be available in HDP 2.6 (your question doesn't mention which HDP you are using). Nevertheless, you should still be able to add another repo and use it despite this error. Just that your auto complete of HDFS path won't work (as hinted in the error). For errors while adding service / repo, please check xa_portal.log for any other stack trace. Hope this helps ! PS - There is no ranger_admin.log, that message was referring to xa_portal.log only.

kkanchu · ‎05-26-2017

ISSUE: While configuring NFS mounts to access HDFS as a part of local FS, we do tend to control the access using nfs proxies as shown below, <property> <name>hadoop.proxyuser.nfsserver.groups</name> <value>nfs-users1,nfs-users2</value> <description> The 'nfsserver' user is allowed to proxy all members of the 'nfs-users1' and 'nfs-users2' groups. Set this to '*' to allow nfsserver user to proxy any group. </description> </property> <property> <name>hadoop.proxyuser.nfsserver.hosts</name> <value>nfs-client-host1.com</value> <description> This is the host where the nfs gateway is running. Set this to '*' to allow requests from any hosts to be proxied. </description> </property> However, a user who has access to NFS server would be able to access (view) the HDFS file system even if they are not part of "hadoop.proxyuser.nfsserver.groups" and "hadoop.proxyuser.nfsserver.hosts" . This may be a security flaw in certain scenarios, ROOT CAUSE: This is due to a property, "nfs.exports.allowed.hosts" which is used to allow the access to the HDFS from the hosts. RESOLUTION: Make sure the desired hosts and permissions are assigned to HDFS. Permissions for the property can be defined as below, <property> <name>nfs.exports.allowed.hosts</name> <value>* rw</value> </property> NOTE: NFS gateway restart may be needed if the property is altered Links: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html#Allow_mounts_from_unprivileged_clients

kkanchu · ‎05-22-2017

Apart from checking the topologies from a Storm WebUI, we can also list the active topologies from one of the cluster nodes, We can use the following command to list the same, /usr/hdp/<HDP-version>/storm/bin/storm list If there are no topologies running, we would get an output as follows, No topologies running.

kkanchu · ‎04-07-2017

ISSUE: Java Heap Space issue in Hive MR engine While working on a sample data set in hive. Query such as "select count(*)" was seen to fail with below error. Starting Job = job_1491603076412_0001, Tracking URL = http://krishna3.openstacklocal:8088/proxy/application_1491603076412_0001/ Kill Command = /usr/hdp/2.4.2.0-258/hadoop/bin/hadoop job -kill job_1491603076412_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2017-04-07 22:18:09,736 Stage-1 map = 0%, reduce = 0% 2017-04-07 22:18:46,065 Stage-1 map = 100%, reduce = 100% Ended Job = job_1491603076412_0001 with errors Error during job, obtaining debugging information... Examining task ID: task_1491603076412_0001_m_000000 (and more) from job job_1491603076412_0001 Task with the most failures(4): ----- Task ID: task_1491603076412_0001_m_000000 URL: http://krishna3.openstacklocal:8088/taskdetails.jsp?jobid=job_1491603076412_0001&tipid=task_1491603076412_0001_m_000000 ----- Diagnostic Messages for this Task: Error: Java heap space FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL Total MapReduce CPU Time Spent: 0 msec Checking the corresponding application logs, we observe that 2017-04-07 22:25:40,828 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:986) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:442) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) ROOT CAUSE: Insufficient HeapSpace in MR Engine for mapreduce.map.memory.mb RESOLUTION: Increasing mapreduce.map.memory.mb from 1.2G to 1.75G and hence increasing the mapreduce.task.io.sort.mb to 1003 mapreduce.map.java.opts to -Xmx1433m and restarting the necessary services did resolve the problem. (NOTE: mapreduce.task.io.sort.mb and mapreduce.map.java.opts value recommendations were made by Ambari )

kkanchu · ‎03-30-2017

@mqureshi thank you for the reply. Yes, "mapreduce.cluster.temp.dir" seems to be legacy property which is no longer available in 2.x.

nish_ambastha · ‎04-25-2019

I did kafka-server-stop.sh and zkServer.sh stop and then restarted it using zkServer.sh start and kafka-server-start.sh and it worked for me.

kkanchu · ‎03-22-2017

Thank you, removed it.

kkanchu · ‎02-16-2017

ISSUE: We did see that Knox gateway servers failing to return Load balancer url after submitting a WebHDFS commands. It could be seen that the logs in the server as below, 2017-02-15 20:15:51,050 DEBUG hadoop.gateway (UrlRewriteProcessor.java:rewrite(157)) - Rewrote URL: http://<Gateway-server-hostname>:50075/webhdfs/v1/user/<username>/test2?op=CREATE&user.name=<username>&namenoderpcaddress=<NN-server-hostname>:8020&overwrite=false, direction: OUT via explicit rule: WEBHDFS/webhdfs/outbound/namenode/headers/location to URL: https://<Gateway-server-hostname>:8443/gateway/production/webhdfs/data/v1/webhdfs/v1/user/<username>/test2?_=AAAACAAAABAAAACwXrr5dePzjWo4CD7w6g—lwAqK25Z-yUGo9MJf3qOHlOPn-oZzMWN3qF17Me78ia7H00bVqhPCLVCZNEbeoRY9Sct1cEkfqtmuqyWnj5LI68GDrc7iKr9loQheBkXuceCE4nf-9zXLqE-m8CVtdQQyxMSQnxcZMaAIPesoLoWDWOVXAgGLzWeVMs—uafWTPORGv5KRqok61gSU_KtCt4_9Igcoa1RrpnFEDtusyUHD9osMP612VJdu4ggzJJaVtLdg_btQhxw20 CAUSE: It is seen that corrupted deployment director (/var/lib/knox/data/deployment) can lead to this issue. RESOLUTION: 1. Take a backup of the deployment directory 2. Delete the directory contents 3. Restart Knox server (recreates the contents).

gopalv · ‎01-10-2017

Nope - compaction retains old files until all readers are out. The implementation has ACID isolation levels - you could start a query and then do "delete from table" & still get no errors on the query which is already in motion.

Online	Offline
Last Visited	‎02-10-2022 05:21 PM

Member Since	‎06-16-2016 08:04 PM
Last Visited	‎02-10-2022 05:21 PM
Posts	43
Kudos received	22

Cloudera Community

Performance comparison in recovering accidental HD...

Re: Unable to create additional HDFS service in Ra...

Hadoop NFS exposing HDFS directory to non nfs prox...

How to list storm topologies on a console in Storm...

org.apache.hadoop.mapred.YarnChild: Error running ...

Re: What is functional difference between "mapredu...

Re: Kafka log-dir .lock exception

Re: Kafka Authorization in secure environment

Knox gateway unusual performance

Re: What happens when a hive partition is queried ...