Member since
01-27-2017
6
Posts
0
Kudos Received
0
Solutions
02-07-2017
06:14 AM
So, the crazy thought you had resolved the days long mistery! An expension for any poor soul that might encounter a similar issue: In our setup, we have three machines reserved for master roles m[1-3] and 40 worker machines w[1-40]. I assigned 2 HBase Masters, HBase Thrift server and HBase REST server on m2 and m3, Region Servers on w[20-40]. I ran hbck from m1 that has no HBase roles on it. Normally, this would fail with ConnectException as it does from all other machines that don't have HBase roles on them w[1-19], because they don't have hbase-site.xml on them and don't know where to look for the zookeeper, so they fall back to default - localhost. However, As m1 is also Zookeeper leader, the localhost is actually an OK default and hbase hbck will work. Sort of. As, m1 doesn't have any HDFS roles either. So, m1 was LITERALLY the only machine in the cluster that would report these inconsistencies. The check would either fail with connection exception or report everything being normal. Many thanks for your time, it saved a bunch of mine (although I also already lost a lot of it :)).
... View more
02-06-2017
02:52 AM
Hm. I'll dig into the permissions issue. However, I'm I doubt that this is the reason behind this weirdness, because, not only that hmaster lives, but on the surface, it appears to be functioning normally. I created a table and filled it with 2 million rows. Did hbase hbck after this. It reported 37 inconsistencies of the same type. hbase:namespace still among them. EDIT: Additional info: When i scan 'hbase:namespace' I get: ROW COLUMN+CELL
default column=info:d, timestamp=1486140313224, value=\x0A\x07default
hbase column=info:d, timestamp=1486140313283, value=\x0A\x05hbase
2 row(s) in 0.3760 seconds ...as I should. More additional info (don't know if relevant), before inconsistency error, in log I get No HDFS region dir found. Looks like this: No HDFS region dir found: { meta => hbase:namespace,,1486140310904.86d0405303ed58995e1507e33cbf66a2., hdfs => null, deployed => hadoop-38.xxxx.xxxxxxxxxxx.de,60020,1486140300341;hbase:namespace,,1486140310904.86d0405303ed58995e1507e33cbf66a2., replicaId => 0 } meta={ENCODED => 86d0405303ed58995e1507e33cbf66a2, NAME => 'hbase:namespace,,1486140310904.86d0405303ed58995e1507e33cbf66a2.', STARTKEY => '', ENDKEY => ''} It says basically the same thing as the error above, just with the additional hint of No HDFS region dir found and it's marked as warning. The deployed part also contains deployment info that I found in /hbase/WALs folder, namely: hdfs dfs -ls /hbase/WALs
Found 16 items
...
drwxr-xr-x - hbase hbase 0 2017-02-06 11:11 /hbase/WALs/hadoop-38.xxxx.xxxxxxxxxxx.de,60020,1486140300341
... My next desperate idea is to try to read whatever it is in /hbase/data/hbase/namespace/86d0405303ed58995e1507e33cbf66a2/.regioninfo (following the No HDFS region dir found hint) as soon as I find some command line protobuf reader. Again, thanks for taking the time to look into this, and as always ANY feedback is much appreciated! Regards!
... View more
02-03-2017
05:45 AM
Hi Ben, Thanks for your response, much appreciated! Actually, that is exactly what I did. I messed it up so bad that I had to delete the service. All I described above actually happened after I: 1. Stopped service 2. Deleted the service 3. hdfs -rm -r /hbase 4. echo "rmr /hbase" | zookeeper-client 5. added the service again At this time incosistencies are piling up, I have 34 of them, and the one described above, found in the namespace table is still there.
... View more
01-31-2017
08:15 AM
Just a quick update, the issue is still present after upgrading to CDH 5.10.0, so... if you sort of had an idea, but were kind of shy or thought "naaaaah, he probably already thought of that", I strongly encourage you to step forward 🙂
... View more
01-27-2017
01:19 AM
Hi everyone! So a reeeally long story short (I can gladly expand upon request) - I added the hbase service, did hbase hbck immediately after this and it already detected one inconsistency: ERROR: Region { meta => hbase:namespace,,1485505125654.b972bf2653eaa96104d6034591386a60., hdfs => null, deployed => hadoop-34.xxxzzz.de,60020,1485505116059;hbase:namespace,, 1485505125654.b972bf2653eaa96104d6034591386a60., replicaId => 0 } found in META, but not in HDFS, and deployed on hadoop-34.xxxzzz.de,60020,1485505116059 When I do hbase hbck -repairHoles the inconsistency is gone, BUT... so is my hbase:namespace table. hbase(main):001:0> scan 'hbase:namespace'
ROW COLUMN+CELL
ERROR: Unknown table hbase:namespace! Interestingly enough, not gone from HDFS: hdfs dfs -ls /hbase/data/hbase
Found 2 items
drwxr-xr-x - hbase hbase 0 2017-01-27 09:18 /hbase/data/hbase/meta
drwxr-xr-x - hbase hbase 0 2017-01-27 09:18 /hbase/data/hbase/namespace ...nor from the the zookeeper: [zk: localhost:2181(CONNECTED) 2] ls /hbase/table
[hbase:meta, hbase:namespace] ...and an interesting side effect is that create_namespace function of the hbase shell is now gone: hbase(main):003:0> create_namespace 'ns1'
ERROR: Unknown table ns1! I did find this ray of hope: HBASE-16294 and this is actually included in latest CDH (I am running 5.9.0 btw). But! This seems to concern only replicas. This is the patch code btw: if (hbi.getReplicaId() == HRegionInfo.DEFAULT_REPLICA_ID) {
// Log warning only for default/ primary replica with no region dir
LOG.warn("No HDFS region dir found: " + hbi + " meta=" + hbi.metaEntry);
} I have replication disabled, and as one can see from the error message: replicaId => 0 Now, I would have let this slide, but the real problem is that over time I get a huge number of these inconsistencies and attempt to fix them results in not being able to find tables from hbase shell. Any ideas would be greatly appreciated!
... View more
Labels:
- Labels:
-
Apache HBase