We recently upgraded from Cloudera 5.15.x to Cloudera 6.2.1. Everything went well but we did notice one issue. We have Sqoops from fairly large tables (>1 millions rows) MSQL to Hbase. This used to run fine but after the upgrade performance for the inserts into Hbase are like 100x slower. We've spent days trying to tune the Hbase server to no avail. Any thoughts? Does the sqoop provided with Cloudera 6.2.1 supposed to work with Hadoop 3.0.0 and Hbase 2.1.2? Or should I consider re-compiling the sqoop executable from source with the proper depdencies.
BTW we've tested the same Sqoop from MSQL (the same table) to HDFS and it is really fast... like 1 minute for a 4 million row table. The MSQL -> Hbase ones takes >8 hours for the same source table.
I should have responded with a bit more detail... but I couldn't figure out how to edit my initial response. So here's a bit more info.
Interesting about turning off hbase audits. Never tried that.
1. Sqoop for MSQL -> Hadoop is still really fast. So I'm not suspecting hdfs configuration issues.
2. I did some testing with hbase pe but unfortunately I didn't get performance numbers before the upgrade. So impossible to compare.
3. hdfs logs look clean
4. hbase logs look generally clean. Sometimes get RPC reponseTooSlow WARNings but doesn't happen often
5. Have run major compaction on the hbase table in question. The table has a number of regions spread across about 10 hbase region servers (no hot spotting)
6. I see minor compactions happening on the table while the sqoop is running.
Since this only happened after the upgrade I was looking for changes in default values for the Cloudera Hbase configuration. And changes in defaults from hbase 1.2.0 to hbase 2.1.2. Tried adjusting a few values but nothing worked. So I set them back. I have read moving from hbase 1.2.x to hbase 2.1.x writes may be a bit slower. But I'm talking like 100x slower for my sqoop. So I'm pretty much sure that's not it.
Another thing I noticed when I started examining the cluster more closely (I'm a developer but have been thrown into the sysadmin role for the upgrade) is that the network wasn't configured correctly. The nodes in the clusters are supposed to know about each other (ie. the /etc/hosts file on each node should have entries for all other nodes in the cluster) and not rely on DNS to resolve other cluster hosts. This isn't the case and the /etc/hosts only has the localhost entries. But once again, it was this way before the upgrade. So something to fix but probably not the cause of the hbase performance issue after the upgrade.
There are a few things worth trying i.e. Setting "dfs.client.read.shortcircuit" as true for RegionServers + "hbase.wal.provider" as "filesystem" + "hbase.wal.meta_provider" as "filesystem" + dfs.domain.socket.path=<Add Same Value Configured For HDFS>. Restart the HBase Service.
Try performing the HBase PE Test before the above Changes & recheck the HBase PE after the above changes. Do let us know the Outcome.
One thing I noticed today in case it may help with this issue...
Today I tried the sqoop from MSQL -> Hbase again on a new table with compression set and pre-split in Cloudera 5.15.1 and Cloudera 6.2.1 environments, Hbase configuration (and HDFS configuration for that matter) is almost identical.
In the Cloudera 6.2.1 (ie. Hbase 2.1.2) environment I see the flush to the HStoreFile happen fairly quickly (only about 32,000 new entries) and in the logs it mentions 'Adding a new HDFS file' of size 321Kb.
In the Cloudera 5.15.1 (ie. Hbase 1.2.x) environment I see the flush happen to the HStoreFile take longer and there are 700,000 entries being flush and the 'Adding a new HDFS file' is of size 6.5Mb.
The memstore flush size is set to 128Mb in both environments and region servers have 24Gb available. So I think it's hitting the 0.4 heap factor for memstores and then it flushes in both cases. Also there are only a few tables with heavy writes so most of the other tables are fairly idle. So I don't think they would take up much memstore space.
In the Cloudera 6.2.1 environment each server holds about 240 regions. In the Cloudera 5.15.1 environment each server holds about 120 regions.
My thinking is that if I can get the Cloudera 6.2.1/hbase 2.1.2 memstore flush happening with a similar size and number of entries as the Cloudera 5.15.1 environment the performance issue for large writes would be solved. Just not sure how to make that happen.
I also noticed that minor compactions happen in both environments take a similar amount of time so I think that's not an issue.