Member since
03-14-2016
4721
Posts
1111
Kudos Received
874
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2452 | 04-27-2020 03:48 AM | |
4892 | 04-26-2020 06:18 PM | |
3978 | 04-26-2020 06:05 PM | |
3225 | 04-13-2020 08:53 PM | |
4934 | 03-31-2020 02:10 AM |
02-03-2019
07:33 AM
@Michael Bronson As we see that it is basically a "500 Error" which basically indicates an Internal Server Error hence you must see a very detailed Stack Trace inside your ambari-server.log. Can you please share the complete ambari-server.log so that we can check what might be failing.
... View more
01-30-2019
10:29 PM
@Vicky Thatavarthi Can you please share the output of the following Ambari API call on the browser where you have already logging in to Ambari UI? http://$AMBARI_HOST:8080/api/v1/stacks/HDF/versions/3.1?fields=operating_systems/repositories/Repositories Please replace the with your ambari Hostname "AMBARI_HOST" and the HDF version (i am using 3.1) and In the output of the above API call you will find a JSON response with few "base_url" Copy the "base_url" based on your Operating System version. Then append the "/repodata/repomd.xml" at the end of your "base_url" and then check if you are able to open that url from ambari server host using CURL command? If the local repo is configured properly and there is no Network / firewall issue then you should see a valid XML file. Example: (from ambari server host) and also try to open the below mentioned URL in your browser to see if it works and it is valid? # curl -ivL http://your.example.localrepo/HDF/centos7/3.x/updates/3.1.0.0/repodata/repomd.xml Assuming your "base_url" is "http://$YOUR.example.localrepo/HDF/centos7/3.x/updates/3.1.0.0" .
... View more
01-30-2019
12:08 PM
1 Kudo
@Yee Zee Please refer to the previously shared detailed steps to test this scenario. This works for any Abrupt Termination of the component (either due to host reboot or if somehow the process was killed abruptly). As mentioned earlier that Auto start of a component is based on its current state and "desired state". Which means if you stop a component using Ambari API Or using Ambari UI then you wont see the recovery because in that case the ambari DB will have the correct info.
... View more
01-30-2019
09:22 AM
1 Kudo
@Yee Zee Test Scenario (i used Ambari 2.6.1.7 for testing) 1. Make sure that the Kafka Broker process is Up and running and Autostart is configured properly for Kafka broker in Ambari UI. 2. Make Sure that the ambari-agent process is running fine on the broker host and sending Heartbeat request properly to ambari server and getting response back. By Looking at the ambari-agent log INFO 2019-01-30 09:12:15,798 Controller.py:333 - Heartbeat response received (id = 2199203)
INFO 2019-01-30 09:12:15,799 Controller.py:342 - Heartbeat interval is 1 seconds . 3. Put the ambari-agent log in tail mode. # tail -f /var/log/ambari-agent/ambari-agent.log . 4. Put the ambari-server.log in tail mode. # tail -f /var/log/ambari-server/ambari-server.log . 5. Now kill the Kafka broker abruptly using the "kill" command like # kill -9 `cat /var/run/kafka/kafka.pid` . 6. On the ambari agent log you will notice some entries like this: INFO 2019-01-30 09:12:15,890 RecoveryManager.py:185 - current status is set to INSTALLED for KAFKA_BROKER
INFO 2019-01-30 09:12:15,890 RecoveryManager.py:255 - KAFKA_BROKER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2019-01-30 09:12:16,703 Controller.py:482 - Wait for next heartbeat over
INFO 2019-01-30 09:12:19,907 RecoveryManager.py:255 - KAFKA_BROKER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2019-01-30 09:12:19,908 RecoveryManager.py:834 - START command cannot be computed as details are not received from Server.
INFO 2019-01-30 09:12:29,082 RecoveryManager.py:255 - KAFKA_BROKER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2019-01-30 09:12:29,083 RecoveryManager.py:834 - START command cannot be computed as details are not received from Server.
.
.
INFO 2019-01-30 09:13:16,753 RecoveryManager.py:255 - KAFKA_BROKER needs recovery, desired = STARTED, and current = INSTALLED.
INFO 2019-01-30 09:13:20,018 ActionQueue.py:238 - Executing command with id = 1-0, taskId = 1542362191 for role = KAFKA_BROKER of cluster TestCluster.
INFO 2019-01-30 09:13:20,018 ActionQueue.py:279 - Command execution metadata - taskId = 1542362191, retry enabled = False, max retry duration (sec) = 0, log_output = True
INFO 2019-01-30 09:13:23,130 ActionQueue.py:324 - Quit retrying for command with taskId = 1542362191. Status: COMPLETED, retryAble: False, retryDuration (sec): -1, last delay (sec): 1
INFO 2019-01-30 09:13:23,131 ActionQueue.py:339 - Command with taskId = 1542362191 completed successfully!
INFO 2019-01-30 09:13:23,131 RecoveryManager.py:185 - current status is set to STARTED for KAFKA_BROKER
INFO 2019-01-30 09:13:23,135 ActionQueue.py:390 - After EXECUTION_COMMAND (START), with taskId=1542362191, current state of KAFKA_BROKER to STARTED . 7. On Ambari-Server.log you will find the entries like following, once the "Kafka Broker" process is killed. 30 Jan 2019 09:12:18,443 INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:607 - State of service component KAFKA_BROKER of service KAFKA of cluster TestCluster has changed from STARTED to INSTALLED at host hdfcluster1.example.com according to STATUS_COMMAND report
30 Jan 2019 09:12:18,444 INFO [ambari-heartbeat-processor-0] HeartbeatMonitor:50 - Setting need for exec command to True for KAFKA_BROKER
30 Jan 2019 09:13:10,503 INFO [ambari-hearbeat-monitor] HeartbeatMonitor:318 - KAFKA_BROKER is at INSTALLED adding more payload per agent ask
30 Jan 2019 09:14:20,424 INFO [ambari-heartbeat-processor-0] HeartbeatProcessor:607 - State of service component KAFKA_BROKER of service KAFKA of cluster TestCluster has changed from INSTALLED to STARTED at host hdfcluster1.example.com according to STATUS_COMMAND report . 8. After this logging status you will find that the kafka broker process is started fine. .
... View more
01-30-2019
05:47 AM
1 Kudo
@Yee Zee Ambari Auto start feature is useful when your components (like kafka broker) is abruptly killed/shut down. Auto start of a component is based on its current state and "desired state".So if you manually stop the services/components then the auto start may not work because the agent compares the current state of these components against the desired state, to determine if these components are to be installed, started, restarted or stopped. . You can use some cron job to keep monitoring the status of Kafka Broker ever few minutes and then start them using ambari API call if needed. Monitoring Kafka Broker State (STARTED / INSTALLED) # curl -u admin:admin -H "X-Requested-By: ambari" -X GET http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/components/KAFKA_BROKER?fields=host_components/host_name,ServiceComponentInfo/state . Get the host_name and state from the above ambari API to find the brokers which are down. . Starting Kafka Broker if it is down (INSTALLED): # curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d '{"RequestInfo":{"context":"Start Kafka Broker","operation_level":{"level":"HOST_COMPONENT","cluster_name":"$CLUSTER_NAME","host_name":"$KAFKA_BROKER_HOST","service_name":"KAFKA"}},"Body":{"HostRoles":{"state":"STARTED"}}}' "http://$AMBARI_HOST:8080/api/v1/clusters/$CLUSTER_NAME/hosts/$KAFKA_BROKER_HOST/host_components/KAFKA_BROKER" .
... View more
01-30-2019
05:03 AM
@Vicky Thatavarthi Based on the below error looks like ambari agent is not able to run the getcwd python command shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory . Can you please confirm if you are running ambari agent as "Non Root" user? (if yes, then have you followed the instructions properly to setup agent to run as Non Root user) . Can you please try changing the following PATH to some other directory and give access (read / write/ execute) to the user who is running ambari agent (in case if you are running agent as non root user) and then restart ambari agent and thentry again: # grep 'AGENT_WORKING_DIR' /var/lib/ambari-agent/bin/ambari-agent
AGENT_WORKING_DIR=/var/lib/ambari-agent . like change it to /tmp/ambari-agent directory and then try again.
... View more
01-21-2019
09:53 AM
@Michael Bronson There is a Type in your URL. The spelling of "FSNamesytem" (should be "FSNamesystem") one character 's' is missing in the word. So please try this: # curl -u admin:admin -H "X-Requested-By: ambari" -X GET "http://name2:8080/api/v1/clusters/clu45/host_components?HostRoles/component_name=NAMENODE&metrics/dfs/FSNamesystem/HAState=standby .
... View more
01-21-2019
09:49 AM
@Sen Ke Logs for certain period is always useful to retain so that in case of any analysis we have the logs to verify if anything went unexpected. Some of the logs like "hdfs-audit.log" are important as they contains all the auditing data of HDFS access. However if you want to delete old data Logs then you can delete them as it wont cause any service interruption. But the best approach will be to implement the Log4j Extra functionality for your logging so that the old logs will be automatically compressed and saved + rolled. As those logs are basically Text files so the compression happens greatly and compressed log size is 10-15 times lower than the original logs. Please refer to the following article to know more about it: https://community.hortonworks.com/articles/50058/using-log4j-extras-how-to-rotate-as-well-as-zip-th.html
... View more
01-21-2019
05:57 AM
@Michael Bronson Example: Please check the "HAState" field value. # curl -u admin:admin -H "X-Requested-By: ambari" -X GET "http://hdfcluster1.example.com:8080/api/v1/clusters/TestCluster/host_components?HostRoles/component_name=NAMENODE&metrics/dfs/FSNamesystem/HAState=active"
. Output: {
"href" : "http://hdfcluster1.example.com:8080/api/v1/clusters/TestCluster/host_components?HostRoles/component_name=NAMENODE&metrics/dfs/FSNamesystem/HAState=active",
"items" : [
{
"href" : "http://hdfcluster1.example.com:8080/api/v1/clusters/TestCluster/hosts/hdfcluster1.example.com/host_components/NAMENODE",
"HostRoles" : {
"cluster_name" : "TestCluster",
"component_name" : "NAMENODE",
"host_name" : "hdfcluster1.example.com"
},
"host" : {
"href" : "http://hdfcluster1.example.com:8080/api/v1/clusters/TestCluster/hosts/hdfcluster1.example.com"
},
"metrics" : {
"dfs" : {
"FSNamesystem" : {
"HAState" : "active"
}
}
}
}
]
} .
... View more
01-21-2019
05:54 AM
@Michael Bronson You can make use of the Ambari API call: http://hdfcluster1.example.com:8080/api/v1/clusters/TestCluster/host_components?HostRoles/component_name=NAMENODE&metrics/dfs/FSNamesystem/HAState=active
... View more