Support Questions

Jost Berthold · ‎04-21-2015

Hi everybody,

I wonder if someone could explain what is going on internally when I use an HBase snapshot as input for map-reduce as explained in [1] (configured by `initTableSnapshotMapperJob` API described in [2]).

My app does the following

1 create a snapshot using the `HBaseAdmin` API

2 create a new HDFS directory in the user's home

3 calls `initTableSnapshotMapperJob` to configure a TableMapper job to run on the created snapshot

(passing the new directory as the tmp restore directory)

4 sets a few more job parameters (the job creates HFiles for bulk import) and then waits for job completion

5 deletes the temporary directory

The problem I am stuck with is that the initialisation (step 3) throws an exception about writing to /hbase/archive (!), after successfully creating the Region servers for the restored snapshot, in the given tmp directory. The exception is given below [3].

I can see in the job's output that regions servers are created before the exception, and the files from the table restore stay in the directory.

I was not expecting hbase to *write* anything to the hbase directories when using a snapshot with an explicitly-given temporary directory to work with. What can I do to make this work?

All this is tested on a cloudera quickstart VM, btw., but that should not really matter IMHO.

Thanks

Jost

[1] http://www.slideshare.net/enissoz/mapreduce-over-snapshots

[2] https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.html#initTable...

[3]

java.util.concurrent.ExecutionException: org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/hbase/archive":hbase:supergroup:drwxr-xr-x

at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:216)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:145)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6286)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6268)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6220)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4087)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4057)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4030)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:787)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:297)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:594)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

jmspaggi · ‎05-08-2015

Ok. so here is the complete situation.

When you run a MR on top of a Snapshot, the MR framework will look at all the inputs and create all the tasks for that. However, those tasks might have to wait for some time to be executed depending on the number of slots available on the cluster vs the number of tasks.

The issue is, if while the tasks are pending one of the input is move/deleted/split/merged, etc. then the splits are not pointing anymore to a valid input and the MR job wil fail.

To avoid that, we have to create al lthe links to all the inputs to make sure HBase keep a reference to those files even if they have to me moved, the same way a snapshot is doing. The issue is, those links have to be in the /hbase folder. And this is why you need the rights for that.

So to be able to run a MR job on top of a snapshot you need a user with reads/writes access to the /hbase folder. This should be fixed in HBase 1.2 (but it's just on the plans for now and you will need to double check wen we will be closer to 1.2).

Also, please keep in mind that doing MR on top of Snapshots bypass all the HBase layers. Therefore, if there is any ACLs or Cell level security activated on the initial table, then will all by bypassed by the MR job. Everything will be readable by the job.

Let me know if you have any other question or if I can help with anything.

HTH.

JMS

View solution in original post

jmspaggi · ‎05-06-2015

Hi Jost,

I will be looking at this I try to figure what the issue is. Can you please confirm which HBase version you use? I see that it runs with the Cloudera user, are you using the QuickStart VM? If so, can you please let me know which version so I try with the same as you?

Thanks,

JMS

Jost Berthold · ‎05-06-2015

Hi JMS,

I am using cloudera quickstart VMs (version 5.3.0) for tests. The problem can also be observed in our production system, which runs cloudera 5.2.4.
The exception cannot be seen in all runs, it depends on the previous use of hbase. In the VM where I see the exception, hbase contains around two weeks old data, and some tables have been dropped.
(I suspected that restoring the snapshot triggers an internal archiving and uses the wrong user)

HTH
/ Jost

jmspaggi · ‎05-06-2015

Thanks for the information. I have started the download of the VM. I will have it by tomorrow morning and will test the scenario.

In the meantime, can you please clarify this:

"

The problem I am stuck with is that the initialisation (step 3) throws an exception about writing to /hbase/archive (!), after successfully creating the Region servers for the restored snapshot, in the given tmp directory. The exception is given below [3].

I can see in the job's output that regions servers are created before the exception, and the files from the table restore stay in the directory.

"

What do you mean here by "Region servers"? This MR job should not create any region servers.

Thanks,

JM

Jost Berthold · ‎05-06-2015

> What do you mean here by "Region servers"? This MR job should not create any region servers.

Sorry, that was a bit misleading. No region servers are created. What is in fact created are regions (by a method with prefix ?RegionServer")

Before the stack trace, the output of the job contains messages "INFO regionserver.HRegion: creating HRegion testtable .."
(one of them for the test program, many of them for the real application, as it uses many regions).

/ Jost

jmspaggi · ‎05-07-2015

Hi,

Thanks for the clarification. I have downloaded the VM and I have starting to code an example.

Quick followup question.

By default, there is no right management on the VM.

Can you please confirm you modified those 2 properties?

<property>
     <name>dfs.permissions.enabled</name>
     <value>true</value>
</property>
<property>
     <name>dfs.permissions</name>
     <value>true</value>
</property>

Else, you should not see nay permission denied. If you have not modified it, can you please share your /etc/hadoop/conf content?

Thanks,

JM

Jost Berthold · ‎05-07-2015

Hi,

No, I did not modify any permissions. Also, I cannot find this property in the directory you are requesting (see below).

The folder contents are attached FYI.

/ Jost

- ** - ** - ** - ** - ** - ** - ** - ** - ** - ** - ** - ** - ** -

[cloudera@quickstart test]$ uname -a
Linux quickstart.cloudera 2.6.32-358.el6.x86_64 #1 SMP Fri Feb 22 00:31:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[cloudera@quickstart test]$ grep -r -e dfs.permissions /etc/hadoop/conf
grep: /etc/hadoop/conf/container-executor.cfg: Permission denied
[cloudera@quickstart test]$ sudo grep -r -e dfs.permissions /etc/hadoop/conf
[cloudera@quickstart test]$ ls -lrt /etc/hadoop/conf/
total 44
-rw-r--r-- 1 root root 3906 Apr 20 19:37 yarn-site.xml
-rw-r--r-- 1 root root 315 Apr 20 19:37 ssl-client.xml
-rw-r--r-- 1 root root 4391 Apr 20 19:37 mapred-site.xml
-rw-r--r-- 1 root root 300 Apr 20 19:37 log4j.properties
-rw-r--r-- 1 root root 1669 Apr 20 19:37 hdfs-site.xml
-rw-r--r-- 1 root root 425 Apr 20 19:37 hadoop-env.sh
-rw-r--r-- 1 root root 3675 Apr 20 19:37 core-site.xml
-rw-r--r-- 1 root root 21 Apr 20 19:37 __cloudera_generation__
-r-------- 1 root hadoop 0 May 6 22:21 container-executor.cfg
-rwxr-xr-x 1 root hadoop 1510 May 6 22:21 topology.py
-rw-r--r-- 1 root hadoop 200 May 6 22:21 topology.map
[cloudera@quickstart test]$

jmspaggi · ‎05-07-2015

Default value is true so if property is not there that mean rights are on.

[cloudera@quickstart ~]$ ls -lrt /etc/hadoop/conf/
total 40
-rwxr-xr-x 1 root root 2375 Dec 3 01:39 yarn-site.xml
-rwxr-xr-x 1 root root 1104 Dec 3 01:39 README
-rwxr-xr-x 1 root root 2890 Dec 3 01:39 hadoop-metrics.properties
-rwxr-xr-x 1 root root 1366 Dec 3 01:39 hadoop-env.sh
-rwxr-xr-x 1 root root 11291 Dec 16 19:26 log4j.properties
-rw-rw-r-- 1 root root 1546 Dec 17 12:55 mapred-site.xml
-rw-rw-r-- 1 root root 1915 Dec 17 12:55 core-site.xml
-rw-rw-r-- 1 root root 3737 May 7 16:06 hdfs-site.xml

The files we have are a bit different. Have you activated CM?

I have extracted you CDH java branch locally and will dig into the code. I looked at 1.0 and I saw nothing wrong. But you are running on 0.98.6.

I will provide a feedback shortly.

Thanks,

JM

Jost Berthold · ‎05-07-2015

Thanks.

Let me know if you still need /etc/hadoop/conf contents (which is actually a link to /etc/alternatives/hadoop-conf)

I am certain that I did not modify it (not consciously or manually, that is 🙂 ) it should be the default CDH-5.3.0 quickstart one.

/ Jost

jmspaggi · ‎05-08-2015

FYI,

I'm able to reproduce the issue.

Steps:

1) Download 5.3.0 VM,

2) Change hadoop-site.xml to manage permissions,

3) Create and fill a table,

4) Create snapshot,

5) Try to MR over it.

I'm now debugging step by step to see where it's coming from.

Can you please send me your TableMapReduceUtil.initTableSnapshotMapperJob line?

Thanks,

JM

Cloudera Community

Support Questions

HBase snapshots as Map-reduce job input - permission denied

Understanding Spark through Map Reduce

Map Reduce job on YARN hangs in ACCEPTED state

How to resolve "Permission denied" errors in CDH

Oozie job failed with an error - hive-site.xml (P...

Permission denied: user=yarn, access=WRITE oozie s...

Map and Reduce Error: Java heap space

Pig Map reduce permission denied after history ser...

Hive Write permission denied

Permission denied (publickey,gssapi-keyex,gssapi-w...

Log4j.properties permission denied