Member since
12-14-2015
15
Posts
1
Kudos Received
0
Solutions
07-27-2016
03:08 PM
1 Kudo
Ok - now after i removed all the snapshot and logfiles in /var/log/zookeeper - shut down everything and started zookeeper all alone, then HDFS, then YARN and then everything else it worked. For people that might find this thread: What happened was, my filesystem ran full - system stopped. Lots of corrupted files. Ambari did not catch errors as they were on the filesystem and the problem were corrupted files. How to solve it. 1. Shut down 2. Start each service on its own 3. If it fails check /var/log/<SERVICENAME> (e.g. /var/log/zookeeper) 4. Find the last logfile 5. tail -f /var/log/zookeeper/zookeeper.out or even tail -f /var/log/zookeeper/zookeeper.out | grep --line-buffered ERROR 2016-07-27 14:29:33,829 - INFO [main:FileSnap@83] - Reading snapshot /hadoop/zookeeper/version-2/snapshot.1612d5
2016-07-27 14:29:34,826 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
"zookeeper.out" 49L, 7773C
6. Start the service again - anaylze the errors - right before it fails with an uncaught exception it will show you the file it is trying to read. Copy that file to another location or rename it to file.BACKUP. Delete the original file. Try starting up again and check for errors - rinse and repeat step 5 and 6 until now read errors are found. 7. Do this for every service that fails to start - if you are lucky you will be able to find all corrupted files and remove them. Lesson learned: If you let your filesystem run full you are in for an adventure. Thanks everyone for their help. I hope this helps others with similar problems trying HADOOP on this sandbox.
... View more
07-27-2016
02:28 PM
The Zookeeper startup error seems to come from a broken zookeeper snapshot that cannot be read - can I delete this zookeeper snapshot without creating more problems? 2016-07-27 14:29:33,805 - INFO [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2016-07-27 14:29:33,806 - INFO [main:Environment@100] - Server environment:java.io.tmpdir=/tmp
2016-07-27 14:29:33,806 - INFO [main:Environment@100] - Server environment:java.compiler=<NA>
2016-07-27 14:29:33,806 - INFO [main:Environment@100] - Server environment:os.name=Linux
2016-07-27 14:29:33,806 - INFO [main:Environment@100] - Server environment:os.arch=amd64
2016-07-27 14:29:33,806 - INFO [main:Environment@100] - Server environment:os.version=2.6.32-573.12.1.el6.x86_64
2016-07-27 14:29:33,806 - INFO [main:Environment@100] - Server environment:user.name=zookeeper
2016-07-27 14:29:33,806 - INFO [main:Environment@100] - Server environment:user.home=/home/zookeeper
2016-07-27 14:29:33,806 - INFO [main:Environment@100] - Server environment:user.dir=/home/zookeeper
2016-07-27 14:29:33,807 - INFO [main:ZooKeeperServer@755] - tickTime set to 2000
2016-07-27 14:29:33,807 - INFO [main:ZooKeeperServer@764] - minSessionTimeout set to -1
2016-07-27 14:29:33,807 - INFO [main:ZooKeeperServer@773] - maxSessionTimeout set to -1
2016-07-27 14:29:33,815 - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2016-07-27 14:29:33,829 - INFO [main:FileSnap@83] - Reading snapshot /hadoop/zookeeper/version-2/snapshot.1612d5
2016-07-27 14:29:34,826 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
"zookeeper.out" 49L, 7773C
... View more
07-27-2016
01:15 PM
So Zookeeper is the only process not starting now ... seems like there are more corrup files. Is there the option install the zookeeper service again in the Sandbox? 2016-07-27 13:15:35,961 - INFO [main:Environment@100] - Server environment:user.name=zookeeper
2016-07-27 13:15:35,961 - INFO [main:Environment@100] - Server environment:user.home=/home/zookeeper
2016-07-27 13:15:35,961 - INFO [main:Environment@100] - Server environment:user.dir=/home/zookeeper
2016-07-27 13:15:35,962 - INFO [main:ZooKeeperServer@755] - tickTime set to 2000
2016-07-27 13:15:35,962 - INFO [main:ZooKeeperServer@764] - minSessionTimeout set to -1
2016-07-27 13:15:35,962 - INFO [main:ZooKeeperServer@773] - maxSessionTimeout set to -1
2016-07-27 13:15:35,971 - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2016-07-27 13:15:35,987 - INFO [main:FileSnap@83] - Reading snapshot /hadoop/zookeeper/version-2/snapshot.1612d5
2016-07-27 13:15:35,997 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.13.8.41:40623
2016-07-27 13:15:36,005 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2016-07-27 13:15:36,005 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.13.8.41:40623 (no session established for client)
2016-07-27 13:15:36,922 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:576)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:595)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:561)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:643)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:158)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:272)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:399)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:122)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:113)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
... View more
07-27-2016
01:00 PM
WOW - I think backing up and deleting the file actually helped a lot - it's not perfect but it still works. /var/lib/ambari-agent/data/structured-out-status.json
A lot of the services simply went green after i was able to restart zookeeper. So all services were running - Ambari is just unable to monitor them correctly. Unfortunatly ZOOKEEPER itself simply return to a red state - I need to dive the logs to find out what the next problem is.
... View more
07-26-2016
02:43 PM
My AMBARI-AGENT.log show weird errors as well.
... View more
07-26-2016
02:36 PM
HTOP shows little memory usage Tahnk you for your support!
... View more
07-26-2016
02:28 PM
My VM has 16GB of memory. So i should be fine. Also HTOP does not show more memory usage than 6GB (I rebooted). I found some errors in /var/log/hive. Not sure what to make of it.
... View more
07-26-2016
12:57 PM
Where can If find error logs files if no obvious error occurs?
... View more
07-26-2016
12:34 PM
NONE of them "FAIL" - they simply turn RED after 60 seconds without producing any errors in log files.
... View more
07-26-2016
12:23 PM
I am using the Hortonworks Sandbox - it previously ran just fine. I did not use the system for some time and it seems the machine is broken even after a reboot. Whenever I start a service (e.g. HIVE) all proces steps appear green. For about 60 seconds the service turns up green BUT then switches back to RED. It does not matter which service I start - it's the same behaviour for all services ... all services start up fine BUT then switch to RED. The 9 alerts are all OLD errors from 5 months ago ... any hint where to look for the error is appreciated.
... View more
Labels:
- Labels:
-
Apache Hive