We recently upgraded from Cloudera 5.15.x to Cloudera 6.2.1. Everything went well but we did notice one issue. We have Sqoops from fairly large tables (>1 millions rows) MSQL to Hbase. This used to run fine but after the upgrade performance for the inserts into Hbase are like 100x slower. We've spent days trying to tune the Hbase server to no avail. Any thoughts? Does the sqoop provided with Cloudera 6.2.1 supposed to work with Hadoop 3.0.0 and Hbase 2.1.2? Or should I consider re-compiling the sqoop executable from source with the proper depdencies.
BTW we've tested the same Sqoop from MSQL (the same table) to HDFS and it is really fast... like 1 minute for a 4 million row table. The MSQL -> Hbase ones takes >8 hours for the same source table.
That is an odd behaviour. Two things to try:
Restart the HBase service and run your sqoop job again. See if that helps with performance.
I should have responded with a bit more detail... but I couldn't figure out how to edit my initial response. So here's a bit more info.
Interesting about turning off hbase audits. Never tried that.
1. Sqoop for MSQL -> Hadoop is still really fast. So I'm not suspecting hdfs configuration issues.
2. I did some testing with hbase pe but unfortunately I didn't get performance numbers before the upgrade. So impossible to compare.
3. hdfs logs look clean
4. hbase logs look generally clean. Sometimes get RPC reponseTooSlow WARNings but doesn't happen often
5. Have run major compaction on the hbase table in question. The table has a number of regions spread across about 10 hbase region servers (no hot spotting)
6. I see minor compactions happening on the table while the sqoop is running.
Since this only happened after the upgrade I was looking for changes in default values for the Cloudera Hbase configuration. And changes in defaults from hbase 1.2.0 to hbase 2.1.2. Tried adjusting a few values but nothing worked. So I set them back. I have read moving from hbase 1.2.x to hbase 2.1.x writes may be a bit slower. But I'm talking like 100x slower for my sqoop. So I'm pretty much sure that's not it.
Another thing I noticed when I started examining the cluster more closely (I'm a developer but have been thrown into the sysadmin role for the upgrade) is that the network wasn't configured correctly. The nodes in the clusters are supposed to know about each other (ie. the /etc/hosts file on each node should have entries for all other nodes in the cluster) and not rely on DNS to resolve other cluster hosts. This isn't the case and the /etc/hosts only has the localhost entries. But once again, it was this way before the upgrade. So something to fix but probably not the cause of the hbase performance issue after the upgrade.
There are a few things worth trying i.e. Setting "dfs.client.read.shortcircuit" as true for RegionServers + "hbase.wal.provider" as "filesystem" + "hbase.wal.meta_provider" as "filesystem" + dfs.domain.socket.path=<Add Same Value Configured For HDFS>. Restart the HBase Service.
Try performing the HBase PE Test before the above Changes & recheck the HBase PE after the above changes. Do let us know the Outcome.
Thanks. I will try those things too.
Honestly I think there's something not right with the new Sqoop 1.4.7 -> Hbase in Cloudera 6.2.1. If I do the 2 step process for the 5 million row MSQL table...
1. Sqoop map reduce MSQL -> HDFS (about 25 seconds)
2. Use Hbase importtsv map reduce utility to read the HDFS CSV file and import into Hbase (about 4 mins)
...it works perfectly fine. If I use the Sqoop -> Hbase REALLY SLOW. Like hours...
Although I checked the servers (ie. find / -name "hbase-client*jar"), it could be something as simple as an older hbase-client 1.2.0 on the classpath somewhere.
These changes did make things run about 2x as fast but still not even close compared to what it was before.
So as stated below think I'm going to stick with the 2 stage process.
Sqoop MSQL -> HDFS
Hbase importtsv HDFS -> Hbase
One thing I noticed today in case it may help with this issue...
Today I tried the sqoop from MSQL -> Hbase again on a new table with compression set and pre-split in Cloudera 5.15.1 and Cloudera 6.2.1 environments, Hbase configuration (and HDFS configuration for that matter) is almost identical.
In the Cloudera 6.2.1 (ie. Hbase 2.1.2) environment I see the flush to the HStoreFile happen fairly quickly (only about 32,000 new entries) and in the logs it mentions 'Adding a new HDFS file' of size 321Kb.
In the Cloudera 5.15.1 (ie. Hbase 1.2.x) environment I see the flush happen to the HStoreFile take longer and there are 700,000 entries being flush and the 'Adding a new HDFS file' is of size 6.5Mb.
The memstore flush size is set to 128Mb in both environments and region servers have 24Gb available. So I think it's hitting the 0.4 heap factor for memstores and then it flushes in both cases. Also there are only a few tables with heavy writes so most of the other tables are fairly idle. So I don't think they would take up much memstore space.
In the Cloudera 6.2.1 environment each server holds about 240 regions. In the Cloudera 5.15.1 environment each server holds about 120 regions.
My thinking is that if I can get the Cloudera 6.2.1/hbase 2.1.2 memstore flush happening with a similar size and number of entries as the Cloudera 5.15.1 environment the performance issue for large writes would be solved. Just not sure how to make that happen.
I also noticed that minor compactions happen in both environments take a similar amount of time so I think that's not an issue.