About facundo

facundo · ‎01-23-2018

I solved it following this instructions: https://cwiki.apache.org/confluence/display/AMBARI/Cleaning+up+Ambari+Metrics+System+Data

facundo · ‎01-23-2018

Hello, the host where NameNode and AMS' services run was filled up. I solved it but then AMS Collector doesn't start. This is the AMS Collector's error message: /var/log/ambari-metrics-collector/hbase-ams-master-hw.example.com.out 2018-01-23 10:29:01,077 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=hw.example.com:61181 sessionTimeout=120000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@c540f5a 2018-01-23 10:29:01,095 INFO [main-SendThread(hw.example.com:61181)] zookeeper.ClientCnxn: Opening socket connection to server hw.example.com/10.1.0.12:61181. Will not attempt to authenticate using SASL (unknown error) 2018-01-23 10:29:01,114 WARN [main-SendThread(hw.example.com:61181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 2018-01-23 10:29:02,222 INFO [main-SendThread(hw.example.com:61181)] zookeeper.ClientCnxn: Opening socket connection to server hw.example.com/10.1.0.12:61181. Will not attempt to authenticate using SASL (unknown error) 2018-01-23 10:29:02,222 WARN [main-SendThread(hw.example.com:61181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125) 2018-01-23 10:29:02,324 WARN [main] zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=hw.example.com:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure/master 2018-01-23 10:29:02,324 ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 1 attempts 2018-01-23 10:29:02,324 WARN [main] zookeeper.ZKUtil: clean znode for master0x0, quorum=hw.example.com:61181, baseZNode=/ams-hbase-secure Unable to get data of znode /ams-hbase-secure/master org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:714) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:267) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:149) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2838) 2018-01-23 10:29:02,325 ERROR [main] zookeeper.ZooKeeperWatcher: clean znode for master0x0, quorum=hw.example.com:61181, baseZNode=/ams-hbase-secure Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:714) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:267) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:149) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2838) 2018-01-23 10:29:02,325 WARN [main] zookeeper.ZooKeeperNodeTracker: Can't get or delete the master znode org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-secure/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:354) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:714) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:267) at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:149) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2838) /var/log/ambari-metrics-collector/ambari-metrics-collector.log 2018-01-23 10:29:01,191 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hw.example.com/10.1.0.12:61181. Will not attempt to authenticate using SASL (unknown error) 2018-01-23 10:29:01,192 WARN org.apache.zookeeper.ClientCnxn: Session 0x16123a0fb540000 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) 2018-01-23 10:29:01,298 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=hw.example.com:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = Conn ectionLoss for /ams-hbase-secure/meta-region-server 2018-01-23 10:29:02,339 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server hw.example.com/10.1.0.12:61181. Will not attempt to authenticate using SASL (unknown error) 2018-01-23 10:29:02,340 WARN org.apache.zookeeper.ClientCnxn: Session 0x16123a0fb540000 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) There aren't services listen on ports 6188 and 61181. I've configured the HBase's ticktime "hbase.zookeeper.property.tickTime = 6000". Thanks in advance.

facundo · ‎10-30-2017

I solved it with a comment on Spark2, Config, Advanced spark2-env. After that, I restarted Spark2 and their clients and the new configuration files were deployed.

facundo · ‎10-26-2017

Hi @Aditya Sirna, The directory /usr/hdp/2.6.2.0-205/spark2/conf/ is empty but I have installed spark2_2_6_2_0_205-python-2.1.1.2.6.2.0-205.noarch spark2_2_6_2_0_205-2.1.1.2.6.2.0-205.noarch

facundo · ‎10-26-2017

Hi, I just installed Spark2 from Ambari wizard and the Spark2's configuration directory is empty: > ls -l /etc/spark2/2.6.2.0-205/0 total 0 The installation output is: 14:39:22,875 - Backing up /etc/spark2/conf to /etc/spark2/conf.backup if destination doesn't exist already. 14:39:22,875 - Execute[('cp', '-R', '-p', '/etc/spark2/conf', '/etc/spark2/conf.backup')] {'not_if': 'test -e /etc/spark2/conf.backup', 'sudo': True} 14:39:22,897 - Checking if need to create versioned conf dir /etc/spark2/2.6.2.0-205/0 14:39:22,900 - call[('ambari-python-wrap', u'/usr/bin/conf-select', 'dry-run-create', '--package', 'spark2', '--stack-version', u'2.6.2.0-205', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1} 14:39:22,940 - call returned (0, '/etc/spark2/2.6.2.0-205/0', '') 14:39:22,941 - Package spark2 will have new conf directories: /etc/spark2/2.6.2.0-205/0 14:39:22,946 - Checking if need to create versioned conf dir /etc/spark2/2.6.2.0-205/0 14:39:22,952 - call[('ambari-python-wrap', u'/usr/bin/conf-select', 'create-conf-dir', '--package', 'spark2', '--stack-version', u'2.6.2.0-205', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1} 14:39:22,987 - call returned (1, '/etc/spark2/2.6.2.0-205/0 exist already', '') 14:39:22,988 - checked_call[('ambari-python-wrap', u'/usr/bin/conf-select', 'set-conf-dir', '--package', 'spark2', '--stack-version', u'2.6.2.0-205', '--conf-version', '0')] {'logoutput': False, 'sudo': True, 'quiet': False} 14:39:23,022 - checked_call returned (0, '/usr/hdp/2.6.2.0-205/spark2/conf -> /etc/spark2/2.6.2.0-205/0') 14:39:23,023 - Ensuring that spark2 has the correct symlink structure 14:39:23,024 - Execute[('cp', '-R', '-p', '/etc/spark2/conf', '/etc/spark2/conf.backup')] {'not_if': 'test -e /etc/spark2/conf.backup', 'sudo': True} 14:39:23,033 - Skipping Execute[('cp', '-R', '-p', '/etc/spark2/conf', '/etc/spark2/conf.backup')] due to not_if 14:39:23,034 - Directory['/etc/spark2/conf'] {'action': ['delete']} 14:39:23,034 - Removing directory Directory['/etc/spark2/conf'] and all its content 14:39:23,035 - Link['/etc/spark2/conf'] {'to': '/etc/spark2/conf.backup'} 14:39:23,035 - Creating symbolic Link['/etc/spark2/conf'] to /etc/spark2/conf.backup 14:39:23,036 - Link['/etc/spark2/conf'] {'action': ['delete']} 14:39:23,036 - Deleting Link['/etc/spark2/conf'] 14:39:23,037 - Link['/etc/spark2/conf'] {'to': '/usr/hdp/current/spark2-client/conf'} 14:39:23,037 - Creating symbolic Link['/etc/spark2/conf'] to /usr/hdp/current/spark2-client/conf 14:39:23,037 - /etc/hive/conf is already linked to /etc/hive/2.6.2.0-205/0 I'm using Ambari version 2.5.2.0, HDP version 2.6.2.0-205 and Spark2 version 2.1.1. Do you know what happened? There is a way to install again the Spark2 configuration? Thanks in advance.

facundo · ‎09-17-2017

Hello, following the documentation for upgrading to Ambari 2.5.2 I stuck on this line Record the location of the Metrics Collector component before you begin the upgrade process. What does it means? Does it refers to the path of the Metrics Collector's database? Thanks in advance.

facundo · ‎03-30-2017

I tried in YARN API and I got this error message [yarn@foo ~]$ curl -v -X PUT -d '{"state": "KILLED"}' 'http://foo.example.com:8088/ws/v1/cluster/apps/application_1487024494103_0099' * About to connect() to foo.example.com port 8088 (#0) * Trying 192.168.1.1... * Connected to foo.example.com (192.168.1.1) port 8088 (#0) > PUT /ws/v1/cluster/apps/application_1487024494103_0099 HTTP/1.1 > User-Agent: curl/7.29.0 > Host: foo.example.com:8088 > Accept: */* > Content-Length: 19 > Content-Type: application/x-www-form-urlencoded > * upload completely sent off: 19 out of 19 bytes < HTTP/1.1 500 Internal Server Error < Cache-Control: no-cache < Expires: Thu, 30 Mar 2017 19:51:36 GMT < Date: Thu, 30 Mar 2017 19:51:36 GMT < Pragma: no-cache < Expires: Thu, 30 Mar 2017 19:51:36 GMT < Date: Thu, 30 Mar 2017 19:51:36 GMT < Pragma: no-cache < Content-Type: application/json < Transfer-Encoding: chunked < Server: Jetty(6.1.26.hwx) < * Connection #0 to host foo.example.com left intact {"RemoteException":{"exception":"WebApplicationException","javaClassName":"javax.ws.rs.WebApplicationException"}}

facundo · ‎03-16-2017

I'm trying to kill an application in YARN but I get the message "Waiting for application ID to be killed". There is a way to kill it fast? Thanks in advance.

facundo · ‎02-24-2017

I found the problem: the device that was filled up has this file /var/lib/ambari-agent/data/structured-out-status.json that differs with the other nodes. I following this steps as root rm -f /var/lib/ambari-agent/data/structured-out-status.json ambari-agent restart And I deleted the PID files in /var/run for applications that aren't responding for restart (like Zookeeper and Ambari Metrics Collector). After that Ambari shows the process down. So I started them and now it works correctly.

facundo · ‎02-24-2017

An application filled up the HDD and after the cleaning the log is corrupted (there are the last five lines) 2017/02/24 05:30:15 [I] Completed XXX.XXX.XXX.XXX - "GET / HTTP/1.1" 500 Internal Server Error 2528 bytes in 26900us 2017/02/24 05:31:15 [I] Completed XXX.XXX.XXX.XXX - "GET / HTTP/1.1" 500 Internal Server Error 2528 bytes in 14789us 2017/02/24 05:32:15 [I] Completed XXX.XXX.XXX.XXX - "GET / HTTP/1.1" 500 Internal Server Error 2528 bytes in 20252us 2017/02/24 05:33:15 [I] Completed XXX.XXX.XXX.XXX - "GET / HTTP/1.1" 500 Internal Server Error 2528 bytes in 16111us 2017/02

Online	Offline
Last Visited	‎08-28-2018 04:04 PM

Member Since	‎06-10-2016 09:26 PM
Last Visited	‎08-28-2018 04:04 PM
Posts	30
Kudos received	4

Cloudera Community

Re: AMS Collector doesn't start

Re: Spark2 conf directory is empty

Re: I can't stop Ambari Metris and Grafana

Re: [SOLVED] [Zeppelin] Job failed: Implementing c...

Re: [SOLVED] java.io.IOException: Login failure fo...

Re: AMS Collector doesn't start

AMS Collector doesn't start

Re: Spark2 conf directory is empty

Re: Spark2 conf directory is empty

Spark2 conf directory is empty

Record the location of the Metrics Collector

Re: YARN: Waiting for application to be killed

YARN: Waiting for application to be killed

Re: I can't stop Ambari Metris and Grafana

Re: I can't stop Ambari Metris and Grafana