About mbigelow

mbigelow · ‎04-10-2017

Helps to pay attention to the error. The connection was interrupted. Check the HS2 logs to try to get more info? Anything in the job logs since either ran some queries?

mbigelow · ‎04-10-2017

It is closing the session and then trying to use it again. The close is outside of the main class (I don't know if it was extracted from somewhere else). Also what is 'zk'? You are checking if it null and closing if it is but I don't see it declared anywhere in the code you posted. 2017-04-06 13:33:19,353 DEBUG main.close - Closing session: 0x15ad7740271bb85 2017-04-06 13:33:19,354 DEBUG main.close - Closing client for session: 0x15ad7740271bb85 2017-04-06 13:33:19,362 DEBUG main-SendThread(dwh-mst-dev01.stor.nccourts.org:2181).readResponse - Reading reply sessionid:0x15ad7740271bb85, packet:: clientPath:null serverPath:null finished:false header:: 4,-11 replyHeader:: 4,55835606664,0 request:: null response:: null

mbigelow · ‎04-10-2017

Is this the name of your HS2 node, node:10000? You probably need to provide the hive-site.xml to the Oozie job. I don't know why it would've worked before but not now though. I know the Hive script action in Oozie provide a field for the XML file. What type of action are you using?

mbigelow · ‎04-10-2017

Your best bet is to write out to a temporary location on the local FS and then upload them to HDFS at the end. It is wonky but the best way to do this with a bash script scheduled through Oozie. Be careful to keep this in check as it will likely generate a lot of small files.

mbigelow · ‎04-10-2017

The failover controllers are clients of ZK. They need to write and read data in ZK and if they can't they trigger a failover event. Increasing the timeout will give them more time to do their job before failing.

mbigelow · ‎04-10-2017

Oh, did you rebalance after adding the new nodes. If no, then the data being accessed is not there and there for it is less likely that you will have containers running on the other nodes.

mbigelow · ‎04-10-2017

Then the queries themselves do not utilize more than the existing cluster or current capacity. Try running the terasort test as it will and you will see the different. Now you could possible tune Hive and/or the query to use more of the cluster or otherwise be faster. I wasn't as clear in my previous answer though. This will not cause a performance boost directly to all queries or jobs but will allow the cluster the scale and improve the overall cluster performance I.E. you can now run twice as many jobs or the same out of jobs but on double the amount of data. There are other factors for Hive performance as well such as the metestore and HS2.

mbigelow · ‎03-02-2017

The most likely cause of a ZKFC failure is due to an issue with ZK. I don't see anything in the log or thread stack to point to a particularity issue. The ZK is sensitive to latency and does not due well on the same host as other worker process like the Datanode that our resource hogs. I don't recommend this type of configuration with HA enabled. You can get away with it on small cluster with no HA (or bluntly no dependencies on ZK functioning well). The connection refused error is that it can't connect to the Namenode on the Service RPC port, 8022. This is triggering a failover event. For this specifically you can check that the NN is up and running while listening on this port. Then check that the each NN can access the other NN over that port. As a last ditch resort you could if no other changes are made, you could try increasing ha.zookeeper.session-timeout.ms. The default is 5 seconds. You could give it more time in contacting ZK before it considers it a failure (I don't see it timing out in the logs above though). The downside is that you could have a real failure that will last longer as the failover event will be delayed until that timeout reached (hint: don't set it too high).

mbigelow · ‎03-02-2017

Are you accessing it from a browser on the same machine that you have it installed on? Put the actual hostname to ip address mapping in the /etc/hosts file and try to access it via the hostname. Since you installed with the installer it will be using the embedded database. To be diligent, ensure that is also running, cloudera-scm-server-db.

mbigelow · ‎03-02-2017

I have ran into this when the host file was incorrect. You said reverse lockups are not working, they should be. Are you seeing an error in the CM Agent logs? Can you post your hosts file, mask it if needed?

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: HS2 may be unavailable, check server status

Re: Receiving ZooKeeperException when trying conne...

Re: HS2 may be unavailable, check server status

Re: How to get logs of shell scripts in oozie

Re: Failover Controllers Health Bad leads to compl...

Re: Adding nodes will improve performance ?

Re: Adding nodes will improve performance ?

Re: Failover Controllers Health Bad leads to compl...

Re: Cloudera Manager Server Panel doesn't work pro...

Re: No socket could be created on 9000 -- [Errno 9...