Member since
12-10-2015
24
Posts
13
Kudos Received
0
Solutions
12-10-2016
01:43 AM
I was trying to add custom alert in ambari but was getting 500 response code but when I check ambari the alerts are there and running. Now when I try to delete those alerts using curl -u admin:admin -i -H 'X-Requested-By:ambari' -X DELETE http://test-cluster:8080/api/v1/clusters/Analytics_CP1_Dev_Test/alert_definitions/1233 Response I get is the following: HTTP/1.1 301 Moved Permanently Location: https://test-cluster:8080/api/v1/clusters/Analytics_CP1_Dev_Test/alert_definitions/1233
Content-Length: 0
Content-Type: text/html; charset=UTF-8 But when I check ambari the alerts are still running. I tried restarting ambari-server but the alerts are still there.
... View more
Labels:
08-19-2016
06:57 PM
change the "connection_timeout": 5.0 field in the alert definition using a put call. Thats present in your json as well
... View more
08-18-2016
08:32 PM
ok as far as I understand: code which does the alerting is not able to understand metrics from storm (ganglia or REST API) but widgets are able to show the same?
... View more
08-18-2016
07:10 PM
To correct my understanding I can see ambari is showing the details about storm in widgets and on the dashboard. From where ambari is picking up the details? I think metrics.json is the link between ambari and the service, correct me if I am wrong. Thanks @Jonathan Hurley
... View more
08-17-2016
09:45 PM
Currently there is no alerts defined for storm in ambari. I am trying to develop custom alert about the usage of storm slots. One easy way is to create a type:script alert and in my script query the parameter from ambari. But I want to create a type:metric alert like how it is done for HDFS disk usage. The alert looks like : {
"href" : "http://[[hostname]]/api/v1/clusters/star_stage/alert_definitions/27",
"AlertDefinition" : {
"cluster_name" : "nameofmycluster",
"component_name" : "NAMENODE",
"description" : "This service-level alert is triggered if the HDFS capacity utilization exceeds the configured warning and critical thresholds. It checks the NameNode JMX Servlet for the CapacityUsed and CapacityRemaining properties. The threshold values are in percent.",
"enabled" : true,
"id" : 27,
"ignore_host" : false,
"interval" : 2,
"label" : "HDFS Capacity Utilization",
"name" : "namenode_hdfs_capacity_utilization",
"scope" : "ANY",
"service_name" : "HDFS",
"source" : {
"jmx" : {
"property_list" : [
"Hadoop:service=NameNode,name=FSNamesystemState/CapacityUsed",
"Hadoop:service=NameNode,name=FSNamesystemState/CapacityRemaining"
],
"value" : "{0}/({0} + {1}) * 100"
},
"reporting" : {
"ok" : {
"text" : "Capacity Used:[{2:.0f}%, {0}], Capacity Remaining:[{1}]"
},
"warning" : {
"value" : 80.0,
"text" : "Capacity Used:[{2:.0f}%, {0}], Capacity Remaining:[{1}]"
},
"critical" : {
"value" : 90.0,
"text" : "Capacity Used:[{2:.0f}%, {0}], Capacity Remaining:[{1}]"
},
"units" : "%"
},
"type" : "METRIC",
"uri" : {
"http" : "{{hdfs-site/dfs.namenode.http-address}}",
"https" : "{{hdfs-site/dfs.namenode.https-address}}",
"https_property" : "{{hdfs-site/dfs.http.policy}}",
"https_property_value" : "HTTPS_ONLY",
"default_port" : 0.0,
"high_availability" : {
"nameservice" : "{{hdfs-site/dfs.nameservices}}",
"alias_key" : "{{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}",
"http_pattern" : "{{hdfs-site/dfs.namenode.http-address.{{ha-nameservice}}.{{alias}}}}",
"https_pattern" : "{{hdfs-site/dfs.namenode.https-address.{{ha-nameservice}}.{{alias}}}}"
}
}
}
}
} My question is for storm I don't see any jmx metrics configured in metrics.json. So I tried doing the following : {
"AlertDefinition" : {
"cluster_name" : "star_stage",
"component_name" : "NIMBUS",
"description" : "test storm slots",
"ignore_host" : false,
"interval" : 2,
"label" : "storm Capacity Utilization",
"name" : "storm_supervisor_utilization",
"scope" : "ANY",
"service_name" : "STORM",
"source" : {
"ganglia" : {
"property_list" : [
//name is same what I found in metrics.json
"Total Slots",
"Used Slots"
],
"value" : "{1}/{0} * 100"
},
"reporting" : {
"ok" : {
"text" : "Capacity Used:[{2:.0f}%, {0}], Capacity Remaining:[{1}]"
},
"warning" : {
"value" : 80.0,
"text" : "Capacity Used:[{2:.0f}%, {0}], Capacity Remaining:[{1}]"
},
"critical" : {
"value" : 90.0,
"text" : "Capacity Used:[{2:.0f}%, {0}], Capacity Remaining:[{1}]"
},
"units" : "%"
},
"type" : "METRIC",
"uri" : {
"http" : "www.google.com"
}
}
}
} I am not sure how this alert is working. does it use the link in uri to get the metrics? How do I use the ganglia metrics in the alert definition to generate alert. You are the expert on alerting @Jonathan Hurley Storm metrics.json: https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/STORM/0.10.0/metrics.json What is significance of component and hostComponent in metrics.json "NIMBUS": {
"Component": [
{
"type": "ganglia",
"metrics": {"default": {
.....}}}
],
"HostComponent": [
{
"type": "ganglia",
"metrics": {
"default": {
"metrics/boottime": {
... View more
Labels:
08-17-2016
09:24 PM
for my questions : Yes the alert_definition can be updated using a PUT api call.
... View more
08-12-2016
01:47 AM
@Neeraj if I create my custom alert script and register it with the json. Will it be possible in future ti change the threshold ? Can I edit the alert definition? or I have to make the script not to take threshold values from the alert definition?
... View more
08-11-2016
07:32 PM
@Jay SenSharmaHi, I almost developed my script and then found this. Why can't we simply execute df -h from python and parse the result instead of doing all the calculations? Any problem with this approach?
... View more
08-05-2016
01:31 AM
Hi , I have a custom service and I want to enable ambari metrics for this. I was testing it using a simple property. But in the ambari-server log I see error " Unable to get JMX metrics. No port value for Component-name. " How can I specify the port on which my service is running JMX? I checked the code of ambari but not able to figure out how it is loading the port numbers. There are some default ports for some services hardcoded but still cannot figure out how to do for my custom service. Thanks for the help !
... View more
Labels:
06-27-2016
11:25 PM
1 Kudo
I am trying to build ambari rom source code. I cloned the code and set up the envinronment as mentioned in the developer guide. When I run mvn clean install : I get the following error at a test case which causes the build to fail. Failed tests:
AmbariLdapDataPopulatorTest.testSynchronizeExistingLdapGroups_removeDuringIteration:325
Expectation failure on verify:
AmbariLdapDataPopulatorTestInstance.getLdapGroupByMemberAttr("group2"): expected: 1, actual: 0
StackManagerTest.testStackServiceExtension:276 expected:<3> but was:<4>
StackManagerTest.testGetStackServiceInheritance:364 expected:<4> but was:<5>
UpgradeCatalog222Test.testInitializeStromAndKafkaWidgets:1109
Unexpected method call AmbariManagementController.initializeWidgetsAndLayouts(EasyMock for interface org.apache.ambari.server.state.Cluster, EasyMock for interface org.apache.ambari.server.state.Service):
AmbariManagementController.getClusters(): expected: at least 0, actual: 1
AmbariManagementController.initializeWidgetsAndLayouts(EasyMock for interface org.apache.ambari.server.state.Cluster, EasyMock for interface org.apache.ambari.server.state.Service): expected: 1, actual: 0 Some ambari developer can help me out here. I haven't changed any thing in the code yet. I am getting the above error while building ambari-server. [INFO] Ambari Main ........................................ SUCCESS [ 8.593 s]
[INFO] Apache Ambari Project POM .......................... SUCCESS [ 0.403 s]
[INFO] Ambari Web ......................................... SUCCESS [ 41.950 s]
[INFO] Ambari Views ....................................... SUCCESS [ 2.372 s]
[INFO] Ambari Admin View .................................. SUCCESS [ 11.313 s]
[INFO] ambari-metrics ..................................... SUCCESS [ 0.837 s]
[INFO] Ambari Metrics Common .............................. SUCCESS [ 2.108 s]
[INFO] Ambari Metrics Hadoop Sink ......................... SUCCESS [ 4.721 s]
[INFO] Ambari Metrics Flume Sink .......................... SUCCESS [ 2.309 s]
[INFO] Ambari Metrics Kafka Sink .......................... SUCCESS [ 3.176 s]
[INFO] Ambari Metrics Storm Sink .......................... SUCCESS [ 1.901 s]
[INFO] Ambari Metrics Collector ........................... SUCCESS [02:16 min]
[INFO] Ambari Metrics Monitor ............................. SUCCESS [ 3.415 s]
[INFO] Ambari Metrics Grafana ............................. SUCCESS [ 6.373 s]
[INFO] Ambari Metrics Assembly ............................ SUCCESS [01:30 min]
[INFO] Ambari Server ...................................... FAILURE [44:47 min]
... View more
Labels:
06-24-2016
09:26 PM
I restarted the agents but looks like the errors are still present. I see the old cluster name in a execution_commands table. I have identified the task_id which are having the command haivng my old clustername. Should I delete them from the database?
... View more
06-23-2016
11:56 PM
1 Kudo
i am getting following logs in my ambari-server logs . I check the database and there is no entry of this cluster name "oldclustername" in any tables(i grepped on my dump file and also checked manually in all tables). I named my cluster as "oldclustername" but then renamed it to "newclustername" but logs are still showing these messages. Only table I see suspicious entries is ambari.request which has cluster id=-1 and 2, where cluster_id correctly maps to my newclustername but -1 looks like a wrong entry. Any suggestions where should I look? 23 Jun 2016 23:38:32,233 ERROR [alert-event-bus-2] AlertReceivedListener:329 - Unable to process alert yarn_nodemanager_health for an invalid cluster named oldclustername
23 Jun 2016 23:38:32,361 ERROR [alert-event-bus-2] AlertReceivedListener:329 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named oldclustername
23 Jun 2016 23:38:32,362 WARN [alert-event-bus-2] AlertReceivedListener:248 - Cluster lookup failed for clusterName=oldclustername
23 Jun 2016 23:38:32,362 WARN [alert-event-bus-2] AlertReceivedListener:134 - Received an alert for ambari_agent_disk_usage which is a definition that does not exist anymore
23 Jun 2016 23:38:32,362 ERROR [alert-event-bus-2] AlertReceivedListener:329 - Unable to process alert kafka_broker_process for an invalid cluster named oldclustername
23 Jun 2016 23:38:32,942 ERROR [alert-event-bus-2] AlertReceivedListener:329 - Unable to process alert datanode_webui for an invalid cluster named oldclustername
23 Jun 2016 23:38:32,943 ERROR [alert-event-bus-2] AlertReceivedListener:329 - Unable to process alert datanode_process for an invalid cluster named oldclustername
23 Jun 2016 23:38:32,943 WARN [alert-event-bus-2] AlertReceivedListener:248 - Cluster lookup failed for clusterName=oldclustername
23 Jun 2016 23:38:32,943 WARN [alert-event-bus-2] AlertReceivedListener:134 - Received an alert for ambari_agent_disk_usage which is a definition that does not exist anymore
23 Jun 2016 23:38:32,943 ERROR [alert-event-bus-2] AlertReceivedListener:329 - Unable to process alert yarn_nodemanager_webui for an invalid cluster named oldclustername
23 Jun 2016 23:38:32,943 ERROR [alert-event-bus-2] AlertReceivedListener:329 - Unable to process alert ams_metrics_monitor_process for an invalid cluster named oldclustername
23 Jun 2016 23:38:32,943 ERROR [alert-event-bus-2] AlertReceivedListener:329 - Unable to process alert yarn_nodemanager_health for an invalid cluster named oldclustername
... View more
Labels:
06-22-2016
02:11 AM
Got it. Tried it on a different cluster and this is working
... View more
06-21-2016
11:08 PM
I added ambari.display.url = http://100.123.123.123:8080 in properties file but I still dont see ambari url in the email. I manually added $ambari.getUrl() but it gets in the email as it is. Any other thing that I can do?
... View more
06-21-2016
09:31 PM
I am trying to modify the template of the alert notification. I see in the template code it tries to display ambari url but it never displays that.
The lower end of the template reads like <div class="ambari-footer">
This notification was sent to $dispatch.getTargetName()
<br/>
Apache Ambari $ambari.getServerVersion()
#if( $ambari.hasUrl() )
<br/>
Ambari Server link: <a href=$ambari.getUrl()>$ambari.getUrl()</a>
#end
</div>
</html>
]]>
</body>
But I think method ambari.hasUrl() is always false. I tried printing the $ambari.getUrl() but it displays as it is. In the emails I see "$ambari.getUrl()". I checked the code https://github.com/apache/ambari/blob/71a1f7e0e5985b1a77bf09b976ebda3ab3fdbbf5/ambari-server/src/main/java/org/apache/ambari/server/state/services/AlertNoticeDispatchService.java @Inject
private Configuration m_configuration; I am not sure from where it is getting injected from. Any suggestions why ambari url is not coming in the configs.
... View more
Labels:
06-17-2016
12:35 AM
@Neeraj Sabharwal I deleted my service name from all these tables but I am still seeing
the same issue in the logs. I see this name is present in lot of tables clusterconfig execution_command upgrade_item stage serviceconfig requestresourcefilter requestoperationlevel request and there are many others. Should I delete from everywhere?
... View more
06-17-2016
12:05 AM
Check ambari database in tables ambari.clusterservices ambari.hostcomponentstate for service name you are seeing in the logs.
... View more
03-22-2016
01:13 AM
3 Kudos
I am trying to do hive bench marking(https://github.com/hortonworks/hive-testbench) but when I run setup script it loads data is some table but fails after sometime fails with the following error: OK
Time taken: 0.264 seconds
+ '[' X = X ']'
+ FORMAT=orc
+ i=1
+ total=24
+ DATABASE=tpcds_bin_partitioned_orc_2
+ for t in '${FACTS}'
+ echo 'Optimizing table store_sales (1/24).'
Optimizing table store_sales (1/24).
+ COMMAND='hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc'
+ runcommand 'hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc'
+ '[' XON '!=' X ']'
+ hive -i settings/load-partitioned.sql -f ddl-tpcds/bin_partitioned/store_sales.sql -d DB=tpcds_bin_partitioned_orc_2 -d SCALE=2 -d SOURCE=tpcds_text_2 -d BUCKETS=1 -d RETURN_BUCKETS=1 -d FILE=orc
WARNING: Use "yarn jar" to launch YARN applications.
Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
... OK
Time taken: 0.948 seconds
OK
Time taken: 0.238 seconds
OK
Time taken: 0.629 seconds
OK
Time taken: 0.248 seconds
Query ID = hdfs_20160322014240_60c3f689-816d-409e-b8c7-c6ea636fa12a
Total jobs = 1
Launching Job 1 out of 1
Dag submit failed due to Invalid TaskLaunchCmdOpts defined for Vertex Map 1 : Invalid/conflicting GC options found, cmdOpts="-server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.4.0.0-169 -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseParallelGC -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/ -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator -Dlog4j.configuration=tez-container-log4j.properties -Dyarn.app.container.log.dir=<LOG_DIR> -Dtez.root.logger=INFO,CLA " stack trace: [org.apache.tez.dag.api.DAG.createDag(DAG.java:859), org.apache.tez.client.TezClientUtils.prepareAndCreateDAGPlan(TezClientUtils.java:694), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:487), org.apache.tez.client.TezClient.submitDAG(TezClient.java:434), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:439), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89), org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:75)] retrying...
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
+ '[' 1 -ne 0 ']'
+ echo 'Command failed, try '\''export DEBUG_SCRIPT=ON'\'' and re-running'
Command failed, try 'export DEBUG_SCRIPT=ON' and re-running
+ exit 1 Not sure what is wrong. Anyhelp is appreciated.
... View more
Labels:
02-16-2016
06:41 PM
Yes but this repo does not implement service check. Check my repo.So still no answer to my question. @Mark Herring may I know what I missed here. My question is still unanswered.
... View more
01-06-2016
09:35 PM
@Ali Bajwalooks like the link your are saying is cassandra service is not correct. It takes me to trunk.
... View more
01-04-2016
03:53 PM
4 Kudos
Repo Description This service provides functionality like starting,stopping a service, service check. This also installs Opscenter by datastax which provides the UI for Cassandra cluster. Repo Info Github Repo URL https://github.com/Symantec/ambari-cassandra-service.git Github account name ajak6 Repo name ambari-cassandra-service
... View more
- Find more articles tagged with:
- ambari-extensions
- ambari-service
- Cloud & Operations
Labels:
12-18-2015
03:18 AM
@Ali Bajwa
so just want to clarify, in the client script the install method is
calling install_package(env) method and same ofr master, from where
install_package is picking up the package names? is it from metainfo.xml
(osspecifics tag)? or can we write in the install method
Execute(format('yum install cassandra'))?
... View more
12-11-2015
12:59 AM
2 Kudos
Hi, I am trying to develop a custom service for ambari. My question is for a cassandra client I am installing the whole cassandra package and doing start cassandra so that I would only use the client to connect to the cluster. But is there any way to tell ambari that only install specific package on client and some specific package on slaves. If it is possible I will be only able to install cassandra client on client nodes.
... View more
Labels:
12-10-2015
11:34 PM
2 Kudos
I was trying to develop a service for cassandra. my question is regarding the service check method. I wrote some smoke test in service_check.py but when I install the service and do service check it tries to do service check on client node where cassandra service won't be running, so it fails. One work around I thought of is that I will provide host IP address in the smoke test so that the client executes the check on that node but then it is possible that that node is down and the service check will always fail in that case. Is there any way that I can get the IP address of all the hosts in the cluster from ambari and do Execute(smoke test) on each node unless it is passed on any one of them before ambari declare that service check failed. I see the Execute method will run the commands specified in it and if they execute correctly it will update the service check ran correctly. Below is sample code of service check.py Any help is appreciated class ServiceCheck(Script):
def service_check(self, env):
import params
env.set_params(params)
cmdfile = format("/tmp/cmds")
cmdfile=
File(cmdfile,
mode=0600,
content=InlineTemplate("CREATE KEYSPACE IF NOT EXISTS smokedemotest WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };\n"
"Use smokedemotest;\n"
"CREATE TABLE IF NOT EXISTS smokeusers (firstname text,lastname text,age int,email text,city text,PRIMARY KEY (lastname));\n"
"INSERT INTO smokeusers (firstname, lastname, age, email, city) VALUES ('John', 'Smith', 46, 'johnsmith@email.com', 'Sacramento');\n"
"DROP TABLE smokedemotest.smokeusers;\n"
"DROP KEYSPACE smokedemotest;\n\n")
)
Execute(format("cqlsh -f {cmdfile}"))
Currently the execute command will run on the machine on which ambari executes service check. Can you please correct my understanding how service check executes the command, how does it decides where to execute the commands( which machine?
... View more
Labels: