Member since
09-17-2015
70
Posts
79
Kudos Received
20
Solutions
12-20-2016
03:20 PM
7 Kudos
This article will go over the concepts of
security in an Hbase cluster. More specifically we will concentrate on ACL
based security and how to apply at the different levels of granularity on an
Hbase model. From an overall security perspective and acces
control list , or ACL, is a list of permissions associated with an object ACLs focus on the access rules pattern. ACL logic Hbase access
contol lists are granted on different levels of data abstractions and
cover types of operations. Hbase data layout Before we go further let us clear out the
hierarchical elements that compose the datastorage Hbase CELL : All values written to Hbase are stored in a what is know as a CELL.
(Cell can also be refered to as KeyValue). Cells are identified by a multidimensionnal key
{row, column, qualifier, timestamp}. In
the example above : CELL =>
Rowkey1,CF1,Q11,TS1 COLUMN FAMILY : A column Family groups together
arbitrary cells. TABLE : All Cells belong to a Column family and
are organized into a table. NAMESPACE : Tables in turn belong to Namespaces.
This can thought of as a database to table logic. With this in mind a table’s
fully qualiefied name is Table =>
Namespace :Table (the default namespace can be omitted) Hbase scopes Permissions are
evaluated starting a the widest scope working to the narrowest scope.
Global Namespace Table Column
Family (CF) Column
Qualifier (Q) Cell For example, a permission granted at a tabe
dominates grants done at the column family level. Permissions Hbase can give granular access rights depending
on each scope. Permissions are either zero or more letters from the set RWXCA.
Superuser : a special user that has unlimited access Read(R) : Read
right on the given scope Write(W) : Write
right on the given scope Execute(X) : Coprocessor
execution on the given scope Create(C) : Can
create and delete tables on the given scope Admin(A) : Right to
perform cluster admin operations, fro example granting rights Combining access rights and scopes creates a
complete matrix of access patterns and roles. In order to avoid complex
conflicting rules it can often be useful to build access patterns from roles
and reponsibilities up.
Role
Responsibilites
Superuser
Usually this role should be reserved solely
to the Hbase user
Admin
(A) Operationnal
role : Performs cluster-wide
operations like balancing, assigning regions
(C) DBA type role, creates and
drops tables and namespaces
Namespace Admin
(A) : Manages a specific
namespaces from an operations perspective can take snapshots and splits etc..
(C) From a DBA perspective can
create tables and give access
Table Admin
(A) Operationnal role can
manage splits,compactions ..
(C) can create snpashots,
restore a table etc..
Power User
(RWX) can use the table by writing
or reading data and possibly use coprocessors.
Consumer
(R) User can only read and
consume data
Some actions need a mix of these permissions to be performed
CheckAndPut / CheckAndDelete : thee actions need RW permissions Increment/Append :
only require W permissions A full complete list of the acl matrix can be
found here : http://hbase.apache.org/book.html#appendix_acl_matrix Setting up In order to setup Hbase ACLs you will need to
modify the Hbase-site.xml with the following properties <property>
<name>hbase.coprocessor.region.classes</name> <value>org.apache.hadoop.hbase.security.access.AccessController,
org.apache.hadoop.hbase.security.token.TokenProvider</value>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value> </property>
<property>
<name>hbase.coprocessor.regionserver.classes</name> <value>org.apache.hadoop.hbase.security.access.AccessController</value> </property>
<property>
<name>hbase.security.exec.permission.checks</name> <value>true</value>
</property> In Ambari this is much easier just enable security and Ambari will automatically set
all these configurations for you. Applying ACLs Now that we have restarted our Hbase cluster
and set up the ACL feature we can start setting up rules. For simplitcity purposes we will use 2 users :
Hbase and testuser. Hbase is the superuser for our cluster and will
let us set the rights accordingly. Namespace As the Hbase use we create an ‘acl’ namespace hbase(main):001:10> create_namespace ‘acl’
0 row(s)in 0.3180 seconds As testuser we will create a table in this new
namespace hbase(main):001:0>
create 'atest','cf'ERROR:
org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient
permissions (user=testuser, scope=default,
params=[namespace=default,table=default:atest,family=cf],action=CREATE) We are not allowed to create a tabe in this
namespace. Super user Hbase will give the rights to testuser. hbase(main):001:10> grant 'testuser','C','@acl'0
row(s) in 0.3360 seconds We can now run the previous command as the
testuser hbase(main):002:0> create 'atest','cf'0
row(s) in 2.3360 seconds We will now open this table to another user
testuser2 hbase(main):002:0> grant 'testuser2','R','@acl'ERROR:
org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions
(user=testuser, scope=acl, params=[namespace=acl],action=ADMIN) Notice we can’t grant rights to other users as
we are missing Admin permissions We can fix this with our Hbase super user hbase(main):002:20> grant 'testuser','A','@acl'0
row(s) in 0.460 seconds
... View more
Labels:
02-07-2016
11:28 AM
8 Kudos
In the big data and distributed system world you may have done a world class job in dev and Unit testing, chances are you still did it on sample data moreover in dev you might not have done it in a truly distributed system but more on one machine simulating distribution. Hence on the prod cluster you could rely on logs and sometime you would also like to connect a remote debugger and other tools you are used to. In this blog post I will go over an example in Java and Eclipse. Partly because I use those and also because I see mostly scala examples so let’s give java a little love. Setting up I have installed a Hortonworks sandbox for this exercise, you can donwload it here. You will also need to open up a port to bind on, on the last post I used port 7777 i’ll stick with this here as well. First we will use a very standard wordcount example you can get in all the spark tutorials. Set up a quick maven project to bundle your jar and dependent libs you might be using. Notice the breakpoint on the 26th line, it will make sense later on. Once your code is done, your unit tests have passed and you are ready to deploy to the cluster, let’s go ahead an build it and push it out the cluster. Spark deployment mode Spark has three modes Standalone,Yarn,Mesos. Most examples talk about standalone so I will focus on Yarn. In yarn every application has an application master process where the first container for the application is started, it responsible for requesting ressources for the application and driving the application as a whole. For spark this means you get to choose if the yarn application master runs the whole application and hence the spark driver or if your client is active and keeps the spark driver. Remote debug launch Now that your jar is on the cluster and you are ready todebug you need to submit your spark job with debug options so your IDE can bind to it. Depending on your Spark version there are different ways to go about this, I am in spark > 1.0 so I will use: Notice address=7777, back to my port I talked about earlier and suspend=y to have the process wait for my IDE to bind. let’s now launch our normal spark submit command If we look at the line a little bit closer I have specified low memory settings for my specific sandbox context and given an input file Hortonworks and an outputfile sparkyarn11.log. Notice we are using here the –master yarn-client deployment mode. As the prompt shows us the system is now waiting on port 7777 for the IDE to bind. IDE Now back in our IDE we will use the remote debug function to bind to our spark cluster. Once you run debug the prompt on your cluster will start showing your IDE bind and you can now run your work from Eclipse directly Back in Eclipse you will be prompted to the debug view and you can go ahead and skip from breakpoint to breakpoint. Remember the breakpoint I pointed out in the code. Your IDE is on this breakpoint waiting to move one. On the right variable panel you can see the file input variable we get from the command line, it is set tohdfs://sandbox.hortonworks.com/user/guest/Hortonworks exactly like our command line input. Great we have just setup our first remote debug on Spark, you should go ahead and try it with–master yarn-cluster and see what it changes.
... View more
Labels: