Member since
08-08-2017
1652
Posts
30
Kudos Received
11
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1916 | 06-15-2020 05:23 AM | |
| 15448 | 01-30-2020 08:04 PM | |
| 2068 | 07-07-2019 09:06 PM | |
| 8099 | 01-27-2018 10:17 PM | |
| 4565 | 12-31-2017 10:12 PM |
02-01-2018
08:55 PM
1 Kudo
@Michael Bronson , Missing data block can be related to data corruption. Use 'hdfs fsck <path> -list-corruptfileblocks -files -locations' to find out which replicas got corrupted. Secondly, In order to fix issue, you can delete the corrupted blocks using 'hdfs fsck / -delete' I hope you find below thread useful for handing missing blocks. https://community.hortonworks.com/questions/17917/best-way-of-handling-corrupt-or-missing-blocks.html
... View more
01-31-2018
07:47 PM
1 Kudo
@Michael Bronson - greetings to a fellow Simpsonified! If the Ambari server is down, then there are obviously problems with the system, and it is a little tricky to say what to do -- A complete answer would look more like a decision tree than a single response. Nevertheless, I'll try to provide some help, with the understanding that a complete answer is beyond the scope of a simple Q&A. First off, why not restart the Ambari service? That's by far the simplest answer. Give it a few minutes to check its connectivity with the other services, and you should be able to proceed via Ambari. Really, this is the only good solution. If you really need to do it the hard way, there are two basic choices: (a) If you know a little about each of the services, you can use service-specific CLIs on each of the masters to check status and/or stop each service. (b) Alternatively, you can use the fact that these services essentially all run in Java and are typically installed on "hdp" paths in the filesystem, to use linux (or OS equivalent) commands to find and kill them. Option (a) requires a little research, and depends on which services are running in the cluster. For instance, you can log into the HDFS Master and use the commands summarized here: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/ClusterSetup.html#Hadoop_Shutdown to shutdown the core Hadoop services. Each of the other major components have similar CLIs, which some diligent websearching on the Apache sites will find. They allow querying the current status of the services, and provide reasonably clean shutdowns. The final option assumes you are a capable Linux administrator, as well as understanding how HDP was installed. If not, you shouldn't try this. Option (b) is rude, crude, and depends overmuch on the famed crash-resilience of Hadoop components. Do NOT take this answer as Hortonworks advice -- to the contrary, it is the equivalent of pulling the plug on your servers, something you should only do on a production system if there are no other alternatives. But the fact is, if you have admin privileges, doing `ps auxwww|grep java|grep -i hdp` (assuming all services have been installed on paths that include the word 'hdp' or 'HDP') on each host in the cluster should elucidate all HDP-related processes still running (and maybe some others; do check the results carefully). If you see fit to kill them ... that's your responsibility. Very important if at all possible to quiesce the cluster first, at least by stopping data inputs and waiting a few minutes. Note the Ambari Agent is typically installed as a service with auto-restart; it is resilient and stateless, so not necessary to stop it before rebooting the server, but you could run `ambari-agent stop` (on each server) to make sure it stays out of the way while you work on the other services. Rebooting the server should restart it too. Hope this helps.
... View more
01-31-2018
10:37 AM
@Michael Bronson, You can use the below curl call to get the info curl -u {ambari-username}:{ambari-password} -H "X-Requested-By: ambari" -X GET http://{ambari-host}:{ambari-port}/api/v1/clusters/{clustername}/hosts/{host-name}/host_components?fields=HostRoles/state
Sample response:
{
"href": "http://10.10.1.1:8080/api/v1/clusters/cl1/hosts/master03/host_components?fields=HostRoles/state",
"items": [
{
"href": "http://10.10.1.1:8080/api/v1/clusters/cl1/hosts/master03/host_components/ACCUMULO_CLIENT",
"HostRoles": {
"cluster_name": "cl1",
"component_name": "ACCUMULO_CLIENT",
"host_name": "master03",
"state": "INSTALLED"
},
"host": {
"href": "http://10.10.1.1:8080/api/v1/clusters/cl1/hosts/master03"
}
},
{
"href": "http://10.10.1.1:8080/api/v1/clusters/cl1/hosts/master03/host_components/APP_TIMELINE_SERVER",
"HostRoles": {
"cluster_name": "cl1",
"component_name": "APP_TIMELINE_SERVER",
"host_name": "master03",
"state": "STARTED"
},
"host": {
"href": "http://10.10.1.1:8080/api/v1/clusters/cl1/hosts/master03"
}
}
]
}
Thanks, Aditya
... View more
01-24-2019
01:47 PM
Once the file is corruputed, you cannot recover from this even after setting dfs.client.block.write.replace-datanode-on-failure.policy=NEVER and restarting HDFS. As a work-around, I created a copy of the file and removed the old one.
... View more
02-02-2018
03:46 PM
1. Block replication if for redundancy of data which ensures data is not lost due to bad disk or node going down. 2. Replication 1 is set in situation when data can recreated at any point of time, the loss of data is not crucial. Like a job chain, output of one job is consumed by others and ebntually all intermediate data needs to be deleted. The intermediate data can be marked for Replication of 1 ( Still its good to have 2 ) 3. Replication factor of 1 makes the cluster fault tolerant. In you case you have 3 worker node, RF of 1 means if a worker is bad, you loose data and the it cant be processed. I suggest you to use at RF=2 if you are concerned about space utilization.
... View more
01-29-2018
09:33 AM
@Michael Bronson, Replace {host-name} with the host name where App timeline server is installed. From your output in the question looks like App time line server is installed on "master01.sys67.com". So you can try this url curl -u $USER:$PASSWORD -X GET -H "X-Requested-By: ambari" http://localhost:8080/api/v1/clusters/HDP/hosts/master01.sys67.com/host_components/APP_TIMELINE_SERVER?fields=HostRoles/state Thanks, Aditya
... View more
01-27-2018
09:15 PM
@Jay , can we just do ambari-server restart and installed the blueprint again ( in case we have time for installation )
... View more
01-27-2018
09:13 PM
@jay if you think this isnt the direction then what to capture from the log?
... View more
01-24-2018
10:43 PM
@JAY what need to verify from user HDFS , if we want to find who remove the files ?
... View more
01-25-2018
04:45 PM
@Jay as we agree yesterday the "no image found" in the log indicate that fsimage files are missing can you please approv ethis?
... View more