Member since
10-06-2015
15
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1404 | 02-09-2021 12:39 PM |
10-22-2021
05:51 AM
We are using Cloudera 6.2.1 and enabling row and cell level ACLs on our Hbase tables. After turning on the Hbase authorization toggle, in some environments, everything works perfectly. We can add new labels and auths from Hbase shell. In a few environments when trying to call the 'add_labels' command in Hbase shell it complains that 'Table hbase:labels does not exist'. To fix it i ran the following command. create 'hbase:labels','f' Then I was able to call the add_labels command no problem. I was just curious what actually creates this table and was trying to understand why in some environments it wasn't created. Richard
... View more
Labels:
- Labels:
-
Apache HBase
02-10-2021
12:25 PM
1 Kudo
To followup... Got this working today. Turns out it was caused by this setting... hadoop.security.group.mapping=org.apache.hadoop.security.ShellBasedUnixGroupsMapping Apparently this does a... bash -c groups ...for the user... which separates the groups by spaces. When I changed to this implementation... hadoop.security.group.mapping=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback ... everything worked correctly. Now hbase shell correctly lists the groups (even ones with spaces) and visibility labels works correctly. That was a fun one... NOT! Richard
... View more
02-09-2021
12:50 PM
So I've enabled cell level security. Having an issue with spaces in groups retrieved from LDAP. In our Cloudera 6.2.1 cluster the LDAP for HDFS is configured to retrieve the groups from our LDAP server. That works fine. But when I run hbase shell (kerberized with my kerberos ticket) any groups with spaces in the 'cn' are split into separate groups when I run the hbase shell 'whoami' command. For example if my LDAP group is called 'my cool group' in hbase shell whoami it shows 3 separate groups... 'my' 'cool' and 'group'. Unfortunately I can't change the group name to remove the spaces in LDAP. Therefore if I created an hbase cell visibility set to a label that maps to this LDAP group I can't retrieve the cell. hbase shell> whoami ... Groups: 'anothergroupwithoutspaces', 'my', 'cool',' group' add_label ['my_cool_label'] set_auth '@my\ cool\ group',''my_cool_label'] (you need backslashes to escape the space) put 'myTable', 'cf1:cq1', 'myValue', { VISIBILITY => 'my_cool_label'] (can't see this cell) From what I can see the only option is instead of using the group 'cn' value I need to add another field in that LDAP node and have the group name without spaces. eg. cn_hadoop = my-cool-group Or is there another option? Richard
... View more
Labels:
- Labels:
-
Apache HBase
02-09-2021
12:39 PM
Figured it out. When you toggle on the row level and cell ACLS hbase configuration it automatically adds entries to the hbase-site.xml for you... eg. the access and visibility coprocessors. So I'm all good. Richard
... View more
02-07-2021
03:54 AM
So we're running Cloudera 6.2.1 which has Hbase 2.1.2 We'd like to provide row and cell level security. Looking at this guide... https://docs.cloudera.com/documentation/enterprise/6/6.2/topics/admin_hbase_security.html#id_v2w_bv3_tw ...I have to set a brunch of properties. hbase.security.exec.permission.checks=true hbase.security.access.early_out=false hfile.format.version=3 Also I need to toggle on the 'Enable Hbase authorization' flag. I assume I also have to configure the (the VisibilityController and AccessController) coprocessors? There is this vague statement in the document mentioned about. --- Optionally, search for and configure HBase Coprocessor Master Classes and HBase Coprocessor Region Classes. --- Also I see in Cloudera 6.2.1 manager Hbase configuration toggles for 'Enable Row level authorization' and 'Enable Cell ACLs'. Do I need to toggle those on too? There is no mention of these in the setup. It's a bit confusing. Richard
... View more
Labels:
- Labels:
-
Apache HBase
-
Security
02-03-2021
04:53 PM
One thing I noticed today in case it may help with this issue... Today I tried the sqoop from MSQL -> Hbase again on a new table with compression set and pre-split in Cloudera 5.15.1 and Cloudera 6.2.1 environments, Hbase configuration (and HDFS configuration for that matter) is almost identical. In the Cloudera 6.2.1 (ie. Hbase 2.1.2) environment I see the flush to the HStoreFile happen fairly quickly (only about 32,000 new entries) and in the logs it mentions 'Adding a new HDFS file' of size 321Kb. In the Cloudera 5.15.1 (ie. Hbase 1.2.x) environment I see the flush happen to the HStoreFile take longer and there are 700,000 entries being flush and the 'Adding a new HDFS file' is of size 6.5Mb. The memstore flush size is set to 128Mb in both environments and region servers have 24Gb available. So I think it's hitting the 0.4 heap factor for memstores and then it flushes in both cases. Also there are only a few tables with heavy writes so most of the other tables are fairly idle. So I don't think they would take up much memstore space. In the Cloudera 6.2.1 environment each server holds about 240 regions. In the Cloudera 5.15.1 environment each server holds about 120 regions. My thinking is that if I can get the Cloudera 6.2.1/hbase 2.1.2 memstore flush happening with a similar size and number of entries as the Cloudera 5.15.1 environment the performance issue for large writes would be solved. Just not sure how to make that happen. I also noticed that minor compactions happen in both environments take a similar amount of time so I think that's not an issue. Richard
... View more
01-18-2021
01:00 PM
These changes did make things run about 2x as fast but still not even close compared to what it was before. So as stated below think I'm going to stick with the 2 stage process. Sqoop MSQL -> HDFS Hbase importtsv HDFS -> Hbase Richard
... View more
01-18-2021
12:58 PM
Thanks. I will try those things too. Honestly I think there's something not right with the new Sqoop 1.4.7 -> Hbase in Cloudera 6.2.1. If I do the 2 step process for the 5 million row MSQL table... 1. Sqoop map reduce MSQL -> HDFS (about 25 seconds) 2. Use Hbase importtsv map reduce utility to read the HDFS CSV file and import into Hbase (about 4 mins) ...it works perfectly fine. If I use the Sqoop -> Hbase REALLY SLOW. Like hours... Although I checked the servers (ie. find / -name "hbase-client*jar"), it could be something as simple as an older hbase-client 1.2.0 on the classpath somewhere. Richard
... View more
01-17-2021
04:02 AM
I should have responded with a bit more detail... but I couldn't figure out how to edit my initial response. So here's a bit more info. Interesting about turning off hbase audits. Never tried that. 1. Sqoop for MSQL -> Hadoop is still really fast. So I'm not suspecting hdfs configuration issues. 2. I did some testing with hbase pe but unfortunately I didn't get performance numbers before the upgrade. So impossible to compare. 3. hdfs logs look clean 4. hbase logs look generally clean. Sometimes get RPC reponseTooSlow WARNings but doesn't happen often 5. Have run major compaction on the hbase table in question. The table has a number of regions spread across about 10 hbase region servers (no hot spotting) 6. I see minor compactions happening on the table while the sqoop is running. Since this only happened after the upgrade I was looking for changes in default values for the Cloudera Hbase configuration. And changes in defaults from hbase 1.2.0 to hbase 2.1.2. Tried adjusting a few values but nothing worked. So I set them back. I have read moving from hbase 1.2.x to hbase 2.1.x writes may be a bit slower. But I'm talking like 100x slower for my sqoop. So I'm pretty much sure that's not it. Another thing I noticed when I started examining the cluster more closely (I'm a developer but have been thrown into the sysadmin role for the upgrade) is that the network wasn't configured correctly. The nodes in the clusters are supposed to know about each other (ie. the /etc/hosts file on each node should have entries for all other nodes in the cluster) and not rely on DNS to resolve other cluster hosts. This isn't the case and the /etc/hosts only has the localhost entries. But once again, it was this way before the upgrade. So something to fix but probably not the cause of the hbase performance issue after the upgrade. Richard
... View more