Created on 04-21-2015 12:55 AM - edited 09-16-2022 02:26 AM
Hi everybody,
I wonder if someone could explain what is going on internally when I use an HBase snapshot as input for map-reduce as explained in [1] (configured by `initTableSnapshotMapperJob` API described in [2]).
My app does the following
1 create a snapshot using the `HBaseAdmin` API
2 create a new HDFS directory in the user's home
3 calls `initTableSnapshotMapperJob` to configure a TableMapper job to run on the created snapshot
(passing the new directory as the tmp restore directory)
4 sets a few more job parameters (the job creates HFiles for bulk import) and then waits for job completion
5 deletes the temporary directory
The problem I am stuck with is that the initialisation (step 3) throws an exception about writing to /hbase/archive (!), after successfully creating the Region servers for the restored snapshot, in the given tmp directory. The exception is given below [3].
I can see in the job's output that regions servers are created before the exception, and the files from the table restore stay in the directory.
I was not expecting hbase to *write* anything to the hbase directories when using a snapshot with an explicitly-given temporary directory to work with. What can I do to make this work?
All this is tested on a cloudera quickstart VM, btw., but that should not really matter IMHO.
Thanks
Jost
[1] http://www.slideshare.net/enissoz/mapreduce-over-snapshots
[3]
java.util.concurrent.ExecutionException: org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/hbase/archive":hbase:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
Created 05-08-2015 09:09 AM
Ok. so here is the complete situation.
When you run a MR on top of a Snapshot, the MR framework will look at all the inputs and create all the tasks for that. However, those tasks might have to wait for some time to be executed depending on the number of slots available on the cluster vs the number of tasks.
The issue is, if while the tasks are pending one of the input is move/deleted/split/merged, etc. then the splits are not pointing anymore to a valid input and the MR job wil fail.
To avoid that, we have to create al lthe links to all the inputs to make sure HBase keep a reference to those files even if they have to me moved, the same way a snapshot is doing. The issue is, those links have to be in the /hbase folder. And this is why you need the rights for that.
So to be able to run a MR job on top of a snapshot you need a user with reads/writes access to the /hbase folder. This should be fixed in HBase 1.2 (but it's just on the plans for now and you will need to double check wen we will be closer to 1.2).
Also, please keep in mind that doing MR on top of Snapshots bypass all the HBase layers. Therefore, if there is any ACLs or Cell level security activated on the initial table, then will all by bypassed by the MR job. Everything will be readable by the job.
Let me know if you have any other question or if I can help with anything.
HTH.
JMS
Created 05-06-2015 07:49 AM
Hi Jost,
I will be looking at this I try to figure what the issue is. Can you please confirm which HBase version you use? I see that it runs with the Cloudera user, are you using the QuickStart VM? If so, can you please let me know which version so I try with the same as you?
Thanks,
JMS
Created 05-06-2015 03:38 PM
Created 05-06-2015 10:12 PM
Thanks for the information. I have started the download of the VM. I will have it by tomorrow morning and will test the scenario.
In the meantime, can you please clarify this:
"
The problem I am stuck with is that the initialisation (step 3) throws an exception about writing to /hbase/archive (!), after successfully creating the Region servers for the restored snapshot, in the given tmp directory. The exception is given below [3].
I can see in the job's output that regions servers are created before the exception, and the files from the table restore stay in the directory.
"
What do you mean here by "Region servers"? This MR job should not create any region servers.
Thanks,
JM
Created 05-06-2015 10:34 PM
Created 05-07-2015 04:13 PM
Hi,
Thanks for the clarification. I have downloaded the VM and I have starting to code an example.
Quick followup question.
By default, there is no right management on the VM.
Can you please confirm you modified those 2 properties?
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
</property>
Else, you should not see nay permission denied. If you have not modified it, can you please share your /etc/hadoop/conf content?
Thanks,
JM
Created 05-07-2015 04:50 PM
Hi,
No, I did not modify any permissions. Also, I cannot find this property in the directory you are requesting (see below).
The folder contents are attached FYI.
/ Jost
- ** - ** - ** - ** - ** - ** - ** - ** - ** - ** - ** - ** - ** -
[cloudera@quickstart test]$ uname -a
Linux quickstart.cloudera 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[cloudera@quickstart test]$ grep -r -e dfs.permissions /etc/hadoop/conf
grep: /etc/hadoop/conf/container-executor.cfg: Permission denied
[cloudera@quickstart test]$ sudo grep -r -e dfs.permissions /etc/hadoop/conf
[cloudera@quickstart test]$ ls -lrt /etc/hadoop/conf/
total 44
-rw-r--r-- 1 root root 3906 Apr 20 19:37 yarn-site.xml
-rw-r--r-- 1 root root 315 Apr 20 19:37 ssl-client.xml
-rw-r--r-- 1 root root 4391 Apr 20 19:37 mapred-site.xml
-rw-r--r-- 1 root root 300 Apr 20 19:37 log4j.properties
-rw-r--r-- 1 root root 1669 Apr 20 19:37 hdfs-site.xml
-rw-r--r-- 1 root root 425 Apr 20 19:37 hadoop-env.sh
-rw-r--r-- 1 root root 3675 Apr 20 19:37 core-site.xml
-rw-r--r-- 1 root root 21 Apr 20 19:37 __cloudera_generation__
-r-------- 1 root hadoop 0 May 6 22:21 container-executor.cfg
-rwxr-xr-x 1 root hadoop 1510 May 6 22:21 topology.py
-rw-r--r-- 1 root hadoop 200 May 6 22:21 topology.map
[cloudera@quickstart test]$
Created 05-07-2015 04:54 PM
Default value is true so if property is not there that mean rights are on.
[cloudera@quickstart ~]$ ls -lrt /etc/hadoop/conf/
total 40
-rwxr-xr-x 1 root root 2375 Dec 3 01:39 yarn-site.xml
-rwxr-xr-x 1 root root 1104 Dec 3 01:39 README
-rwxr-xr-x 1 root root 2890 Dec 3 01:39 hadoop-metrics.properties
-rwxr-xr-x 1 root root 1366 Dec 3 01:39 hadoop-env.sh
-rwxr-xr-x 1 root root 11291 Dec 16 19:26 log4j.properties
-rw-rw-r-- 1 root root 1546 Dec 17 12:55 mapred-site.xml
-rw-rw-r-- 1 root root 1915 Dec 17 12:55 core-site.xml
-rw-rw-r-- 1 root root 3737 May 7 16:06 hdfs-site.xml
The files we have are a bit different. Have you activated CM?
I have extracted you CDH java branch locally and will dig into the code. I looked at 1.0 and I saw nothing wrong. But you are running on 0.98.6.
I will provide a feedback shortly.
Thanks,
JM
Created 05-07-2015 04:57 PM
Created 05-08-2015 05:51 AM
FYI,
I'm able to reproduce the issue.
Steps:
1) Download 5.3.0 VM,
2) Change hadoop-site.xml to manage permissions,
3) Create and fill a table,
4) Create snapshot,
5) Try to MR over it.
I'm now debugging step by step to see where it's coming from.
Can you please send me your TableMapReduceUtil.initTableSnapshotMapperJob line?
Thanks,
JM