Member since
08-01-2014
16
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3194 | 02-15-2017 11:57 AM |
05-15-2019
07:17 AM
Before the failure to open message is this block (only including first line of java stackstrace): 2019-05-14 15:55:53,356 INFO org.apache.hadoop.hbase.regionserver.HRegion: Replaying edits from hdfs://athos/hbase/data/default/deveng_v500/ab693aebe203bc8781f1a9f1c0a1d045/recovered.edits/0000000000094270192
2019-05-14 15:55:53,383 INFO org.apache.hadoop.hbase.regionserver.HRegion: Replaying edits from hdfs://athos/hbase/data/default/deveng_v500/ab693aebe203bc8781f1a9f1c0a1d045/recovered.edits/0000000000094270299
2019-05-14 15:55:53,722 INFO org.apache.hadoop.hbase.regionserver.HRegion: Replaying edits from hdfs://athos/hbase/data/default/deveng_v500/ab693aebe203bc8781f1a9f1c0a1d045/recovered.edits/0000000000094270330
2019-05-14 15:55:53,903 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Auth successful for tomcat (auth:SIMPLE)
2019-05-14 15:55:53,904 INFO SecurityLogger.org.apache.hadoop.hbase.Server: Connection from 10.190.158.151 port: 60648 with unknown version info
2019-05-14 15:55:54,614 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=deveng_v500,\x00\x00\x1C\xAB\x92\xBC\xD8\x02,1544486155414.ab693aebe203bc8781f1a9f1c0a1d045., starting to roll back the global memstore size.
java.lang.IllegalArgumentException: offset (8) + length (2) exceed the capacity of the array: 0
at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:631)
.............
... View more
05-14-2019
07:34 AM
2019-05-14 09:18:26,042 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=deveng_v500,\x00\x00\x1C\xAB\x92\xBC\xD8\x02,1544486155414.ab693aebe203bc8781f1a9f1c0a1d045., starting to roll back the global memstore size.
2019-05-14 09:18:26,043 INFO org.apache.hadoop.hbase.coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => ab693aebe203bc8781f1a9f1c0a1d045, NAME => 'deveng_v500,\x00\x00\x1C\xAB\x92\xBC\xD8\x02,1544486155414.ab693aebe203bc8781f1a9f1c0a1d045.', STARTKEY => '\x00\x00\x1C\xAB\x92\xBC\xD8\x02', ENDKEY => '\x00\x00L\xC6\xAD\xD1\x04'} failed, transitioning from OPENING to FAILED_OPEN in ZK, expecting version 40 2019-05-14 09:18:31,562 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=deveng_v500,\x00\x00\x1C\xAB\x92\xBC\xD8\x02,1544486155414.ab693aebe203bc8781f1a9f1c0a1d045., starting to roll back the global memstore size.
2019-05-14 09:18:31,562 INFO org.apache.hadoop.hbase.coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => ab693aebe203bc8781f1a9f1c0a1d045, NAME => 'deveng_v500,\x00\x00\x1C\xAB\x92\xBC\xD8\x02,1544486155414.ab693aebe203bc8781f1a9f1c0a1d045.', STARTKEY => '\x00\x00\x1C\xAB\x92\xBC\xD8\x02', ENDKEY => '\x00\x00L\xC6\xAD\xD1\x04'} failed, transitioning from OPENING to FAILED_OPEN in ZK, expecting version 58 The region is stuck trying to open on different region servers. I've cycled the nodes to force it to attempt to online elsewhere since the move command doesn't do anything, but no luck. fsck is clean, but hbck with -fixAssignments can't online the region. 19/05/14 09:20:22 WARN util.HBaseFsck: Skip region 'deveng_v500,\x00\x00\x1C\xAB\x92\xBC\xD8\x02,1544486155414.ab693aebe203bc8781f1a9f1c0a1d045.'
19/05/14 09:20:22 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
19/05/14 09:20:22 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x16a31adec7becec
19/05/14 09:20:22 INFO zookeeper.ZooKeeper: Session: 0x16a31adec7becec closed
19/05/14 09:20:22 INFO zookeeper.ClientCnxn: EventThread shut down
Exception in thread "main" java.io.IOException: 1 region(s) could not be checked or repaired. See logs for detail.
... View more
Labels:
- Labels:
-
Apache HBase
02-15-2017
01:01 PM
@saranvisa: https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed_in_55.html#fixed_issues_555 https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_download_55.html#cdh_555
... View more
02-15-2017
11:57 AM
Turned out that the nodes were in the excludes files, just not the host.exclude like we use in CDH5, so it was missed.
... View more
02-15-2017
11:38 AM
We upgraded our clusters from 5.5.2 to 5.5.5 a while ago. We've since identified a few nodes where the alternatives are still referencing the 5.5.2 parcel. root@use542ytb9:~ ( use542ytb9 )
13:15:15 $ which hbase
/usr/bin/which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin:/usr/local/sbin:/root/bin)
root@use542ytb9:~ ( use542ytb9 ) root@use542ytb9:~ ( use542ytb9 )
13:15:18 $ ls /usr/bin/hbase
/usr/bin/hbase
root@use542ytb9:~ ( use542ytb9 )
13:15:24 $ ll /usr/bin/hbase
lrwxrwxrwx 1 root root 23 May 16 2016 /usr/bin/hbase -> /etc/alternatives/hbase
root@use542ytb9:~ ( use542ytb9 )
13:15:28 $ ll /etc/alternatives/hbase
lrwxrwxrwx 1 root root 63 May 16 2016 /etc/alternatives/hbase -> /opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p1426.1277/bin/hbase
root@use542ytb9:~ ( use542ytb9 )
13:15:30 $ ls /opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p1426.1277/bin/hbase
ls: cannot access /opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p1426.1277/bin/hbase: No such file or directory
root@use542ytb9:~ ( use542ytb9 ) We've cycled the cm agent, done full decommissions and recommisisons, rebooted the nodes, and deployed client config. Since we've identified 3 nodes, we're assuming there's others as well. The hadoop services still run on these nodes, but we're unable to run hdfs, hbase, or yarn commands, which has also caused several mapreduce jobs to fail. Is there a good way to repoint these alternatives to the new parcel?
... View more
Labels:
- Labels:
-
HDFS
02-15-2017
07:43 AM
Also, to just go over what we've attempted, we've cycled the datanode (or at least attempted to), rebooted the node, and since we found HDFS-1106 where someone had the same issue, did a refresh, but still can't get it to start.
... View more
02-15-2017
07:34 AM
On HDFS 0.20.2, yes, it's old, 2 datanodes in our prod cluster no longer can start up. The namenode says: 2017-02-15 09:24:52,861 FATAL org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.getDatanode: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage.
2017-02-15 09:24:52,862 INFO org.apache.hadoop.ipc.Server: IPC Server handler 58 on 9000, call register(DatanodeRegistration(cernsrchhadoop504.cernerasp.com:50010, storageID=DS-1574636665-44.128.6.253-50010-1461251397876, infoPort=50075, ipcPort=50020)) from 44.128.6.253:51326: error: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage.
org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage. The kicker though, is that it's saying that datanode cernsrchhadoop504 can't serve that storage, as it's expected to be served by 44.128.6.253, which is actually cersnrchhadoop504 SFrom the namenode: root@cernsrchhadoop388.cernerasp.com:~ ( cernsrchhadoop388.cernerasp.com )
09:28:10 $ nslookup 44.128.6.253
Server: 127.0.0.1
Address: 127.0.0.1#53
Non-authoritative answer:
253.6.128.44.in-addr.arpa name = cernsrchhadoop504.cernerasp.com. Datanode logs are saying similar on 504 2017-02-15 09:24:52,866 ERROR datanode.DataNode (DataNode.java:main(1372)) - org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.UnregisteredDatanodeException: Data node cernsrchhadoop504.cernerasp.com:50010 is attempting to report storage ID DS-1574636665-44.128.6.253-50010-1461251397876. Node 44.128.6.253:50010 is expected to serve this storage. So for the question, how can I get the namenode to realize that the node it is expecting to have that storage is actually the same node that's attempting to serve that storage?
... View more
Labels:
- Labels:
-
HDFS
07-26-2016
08:03 AM
Our reports manager is currently using 19G of 24G on /var, all in /var/lib. More specifically: /var/lib/cloudera-scm-headlamp/cloudera-scm-headlamp We have 9 <cluster>-hdfs directories here which are taking up most of the space. The contents are mostly fsimage.tmp. Some of these clusters no longer exist, and we have other live clusters that aren’t here. The timestamps are from Mar 22. Why is the reports manager keeping fsimage.tmp files, what are some of the configs we can use to manage these, and what’s the main goal (meaning if it’s meant as a backup strategy, why a one time copy instead of continual)?
... View more
Labels:
- Labels:
-
HDFS
07-08-2016
07:33 AM
We have a CM instance that's currently administering 1,340 nodes. From a prior discussion with another team at Cloudera, it came out that CM is only meant to administer 1,000 nodes in it's current form. Due to that, we're looking at splitting out our clusters to another CM instance, but that's the long term plan. For the short term, CM has become very sluggish. A few examples is that, now, if you go to the hosts page in CM, to show all hosts, chrome will say that the tab becomes unresponsive about 2/3 of the time, just because the tab almost never loads. We're unable to go back much more than 2 pages in the command history, and whenever you give a command, such as cycle service, we typically see a 2-4 second delay. We've already done some tuning on our instances. We started with 8gb memory and 2 cores. We've sized up to 30gb memory and 6 cores. We also initially had 4 vm's, with all vm's running multiple services. We added 2 more vm's so we could isolate the host monitor and service monitor, since these two services seemed to be the heaviest used services. We're currently sitting at 18gb xms and xmx options on our main cm server service, with 3gb xmn. Our other services typically sit between 8-12gb heap. What other tuning options are recommended to improve performance?
... View more
Labels:
- Labels:
-
Cloudera Manager
03-03-2016
12:52 PM
Yep, that did it. Didn't realize the id's were event id's and not host id's. Used the attributes.HOST_IDS and was able to pull back the information for the host. With this output, I can sort and build alerting off it. Thank you.
... View more