Member since
09-29-2015
7
Posts
5
Kudos Received
0
Solutions
07-14-2017
11:47 AM
(I think) this occurred after upgrading from Ambari 2.4.2 to 2.5.0.3 (only looked for this when on Ambari 2.5.1.0) Here are the messages in SmartSense View Found these messages in /var/log/ambari-server/ambari-server.log when clicking on launch SmartSense View 14 Jul 2017 11:14:05,523 ERROR [ambari-client-thread-467377] ServerProxy:162 - Failed to execute GET /recommendations. Reason: Connection refused
14 Jul 2017 11:14:13,695 ERROR [ambari-client-thread-467507] ServerProxy:162 - Failed to execute GET /context. Reason: Connection refused
14 Jul 2017 11:14:14,900 ERROR [ambari-client-thread-467507] ServerProxy:162 - Failed to execute GET /checkconfig. Reason: Connection refused
Diff'ing current hst-server.ini versus a previous (think it was from Ambari 2.4.2) yields # diff conf/hst-server.ini conf_13_04_17_04_17.save/
2,4d1
< enable.flex.subscription = false
< smartsense.id=X-00000000-X-00000000
< flex.subscription.id =
6a4
> smartsense.id=X-00000000-X-00000000
27,28d24
< url = http://:9000
< max.heap = 2048
30d25
< run.as.user =
33d27
< version = 1.4.0.2.5.0.3-7
52,57d45
< [recommendation]
< auto.download.interval = 300
< feedback.push.maxentries = 50
< feedback.push.interval = 1800
< expiry = 30
<
The issue was the missing HST server name in the [server] url property (ie server.url in Ambari if you can find it - this was the issue - and still is - this ghosted into and then out of view in Ambari SmartSense Config tab) Managed to change the server.url property to http://<hst server name>:9000 in Ambari (when it reappeared) Restarted SmartSense server And retried the SmartSense view and it now works fine. TBC - work out why server.url property is not always visible in the SmartSense Config tabs all the time.
... View more
Labels:
07-14-2017
11:20 AM
After upgrading Ambari (not entirely sure which version but think it was Ambari 2.4.2 to 2.5.0.3), Smartsense View came up unavailable with these messages: "Service Unavailable" "The SmartSense service is currently unavailable. Please make sure the SmartSense Server is up and running." See screenshot In the /var/log/ambari-server/ambari-server.log log you many see entries like this when you click on the Smartsense view. 14 Jul 2017 11:14:05,523 ERROR [ambari-client-thread-467377] ServerProxy:162 - Failed to execute GET /recommendations. Reason: Connection refused
14 Jul 2017 11:14:14,734 ERROR [ambari-client-thread-467506] ServerProxy:162 - Failed to execute GET /context. Reason: Connection refused
14 Jul 2017 11:14:14,900 ERROR [ambari-client-thread-467507] ServerProxy:162 - Failed to execute GET /checkconfig. Reason: Connection refused
diff'ing the hst-server.ini config file against previous saved config versions yielded this: # diff conf/hst-server.ini conf_13_04_17_04_17.save/
2,4d1
< enable.flex.subscription = false
< smartsense.id=X-00000000-X-00000000
< flex.subscription.id =
6a4
> smartsense.id=X-00000000-X-00000000
27,28d24
< url = http://:9000
< max.heap = 2048
30d25
< run.as.user =
33d27
< version = 1.4.0.2.5.0.3-7
52,57d45
< [recommendation]
< auto.download.interval = 300
< feedback.push.maxentries = 50
< feedback.push.interval = 1800
< expiry = 30
<
Was missing a valid hostname in the url property. Find this in Ambari Smartsense Config tab under Advanced hst-server-conf and add the hostname of the HST server to the server.url property (can't share the screenshot because weirdly his property ghosted into view but now I can't re-find it in Ambari - know it's there because under the [server] section in the hst-server.ini there is a url = http://<hst server name>:9000 - this strangeness was probably the reason it took me so long to find the issue in the first place). Restart Smartsense via Ambari and the Smartsense View was working as usual.
... View more
Labels:
06-30-2017
11:09 PM
1 Kudo
I didn't want to clear the ATS and lose all the job history. I wanted to fix the corruption and preserve the ATS leveldb entries. 2017-06-15 19:17:43,871 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 43 missing files; e.g.: /data/hadoop/ats/leveldb/leveldb-timeline-store/domain-ldb/000015.sst
<snipped>
2017-06-15 19:17:43,871 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(211)) - Stopping ApplicationHistoryServer metrics system...
2017-06-15 19:17:43,873 INFO impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(141)) - timeline thread interrupted.
2017-06-15 19:17:43,875 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(217)) - ApplicationHistoryServer metrics system stopped.
2017-06-15 19:17:43,875 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(605)) - ApplicationHistoryServer metrics system shutdown complete.
2017-06-15 19:17:43,876 FATAL applicationhistoryservice.ApplicationHistoryServer (ApplicationHistoryServer.java:launchAppHistoryServer(171)) - Error starting ApplicationHistoryServer
<snipped>
2017-06-15 19:17:43,877 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status -1
2017-06-15 19:17:43,880 INFO applicationhistoryservice.ApplicationHistoryServer (LogAdapter.java:info(45)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down ApplicationHistoryServer
This post on HCC was helpful: ATS issue. What wasn't obvious from that post is that there could be more than one leveldb "partition" (my term) corrupted. In my case, there was a corruption of the following which required these remedial steps ... I had to remove each of the following CURRENT files: /data/hadoop/ats/leveldb/leveldb-timeline-store/domain-ldb/CURRENT
/data/hadoop/ats/leveldb/leveldb-timeline-store/starttime-ldb/CURRENT
/data/hadoop/ats/leveldb/leveldb-timeline-store/owner-ldb/CURRENT
/data/hadoop/yarn/timeline/timeline-state-store.ldb/CURRENT
I kept copies of the CURRENT files in /tmp/leveldbissue like this: cd <dir where the leveldb files were reporting missing>
mkdir /tmp/leveldbissue
cp -ip CURRENT /tmp/leveldbissue/xxxx-ldb (where xxxx is the deepest dir where the leveldb files were reporting missing)
rm CURRENT
Each time a corrupted leveldb files were found, do the above and restart the ATS (via Ambari) and iterate until no more xxxxx-ldb/.ldb files reporting 'corruption'. Here are the files at the end of my iterations through 'corruptions'. $ cd /tmp/leveldbissue
$ ls -alt CURR*
-rw-r--r-- 1 root root 16 Jun 15 20:28 CURRENT.starttime-ldb
-rw-r--r-- 1 yarn hadoop 16 Apr 13 04:51 CURRENT.timeline-state-store.ldb
-rw-r--r-- 1 yarn hadoop 16 Apr 13 04:51 CURRENT.owner-ldb
-rw-r--r-- 1 yarn hadoop 16 Apr 13 04:48 CURRENT.domain-ldb
The process was fairly painless though the "recovery" process on ATS restart after removing the CURRENT files did take some time for the busy cluster I was working on at the time. If downtime is more of a concern than preserving the ATS job history, you could consider clearing the ATS data. Hope this helps - not a nice one to get in the small hours of the morning when you are on your own.
... View more
Labels:
05-15-2016
04:44 PM
3 Kudos
Assume you have a MySQL database server running on mysqlserver.example.com.
Summary of the steps
MySQL set up
Set up a grafana database on mysqlserver.example.com
Create a MySQL user called grafana, assign and flush privileges
Edit the grafana.ini configuration file to enable use of MySQL (Edit the Advanced ams-grafana-ini section in the Ambari Metrics Config tab in Ambari)
Start the Grafana
Starting Grafana creates the grafana tables. The session table however does not get created and the grafana docs say to create it for MySQL and postgres - so create the session table as per the grafana docs http://docs.grafana.org/installation/configuration/
Start Grafana - and all should be well
Detailed steps
On MySQL server (be sure to change the password):
mysql> create database grafana;
Query OK, 1 row affected (0.01 sec)
mysql> GRANT USAGE ON `grafana`.* to 'grafana'@'mysqlserver.example.com' identified by 'grafanamysqlpasswd';
Query OK, 0 rows affected (0.00 sec)
mysql> GRANT ALL PRIVILEGES ON `grafana`.* to 'grafana'@'mysqlserver.example.com' with grant option;Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql> use grafana;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
mysql> create table `session` (
-> `key` char(16) not null,
-> `data` blob,
-> `expiry` int(11) unsigned not null,
-> primary key (`key`)
-> ) ENGINE=MyISAM default charset=utf8;
Query OK, 0 rows affected (0.01 sec)
Edit the grafana.ini configuration file (in Ambari, edit the Advanced ams-grafana-ini section in the Ambari Metrics Config tab) - see snippet from the config file below
<snipped>
#################################### Database ####################################
[database]
# Either "mysql", "postgres" or "sqlite3", it's your choice
;type = sqlite3
;host = 127.0.0.1:3306
;name = grafana
;user = root
;password =
type = mysql
host = mysqlserver.example.com:3306
name = grafana
user = grafana
password = grafanamysqluserpasswd
# For "postgres" only, either "disable", "require" or "verify-full"
;ssl_mode = disable
# For "sqlite3" only, path relative to data_path setting
;path = grafana.db
#################################### Session ####################################
[session]
# Either "memory", "file", "redis", "mysql", "postgres", default is "file"
;provider = file
provider = mysql
# Provider config options
# memory: not have any config yet
# file: session dir path, is relative to grafana data_path
# redis: config like redis server e.g. `addr=127.0.0.1:6379,pool_size=100,db=grafana`
# mysql: go-sql-driver/mysql dsn config string, e.g. `user:password@tcp(127.0.0.1:3306)/database_name`
# postgres: user=a password=b host=localhost port=5432 dbname=c sslmode=disable
;provider_config = sessions
provider_config = `grafana:grafanamysqluserpasswd@tcp(mysqlserver.example.com:3306)/grafana`
provider = mysql
<snipped>
... View more
Labels: