About sagarshimpi

sagarshimpi · ‎11-12-2019

Hi Mike, Can you do quick check below - **BP-484874736-172.2.45.23-8478399929292:blk_1081495827_7755233 does not exist or is not under Construction >> 1. Are all Datanodes up and running fine within cluster 2. Check on the NN UI and see if any Datanode is NOT reporting blocks in Datanode tab or any Missing blocks reported on NN UI 3. You can run fsck [unless cluster is huge and loaded with data] and check of the block exist and which all nodes has the replica. It might help to drill down the issue.

sagarshimpi · ‎11-11-2019

Hi Vinay, Do you see any error in logs while running "reassign partition tool" ? This might help to debug issue. Were all the brokers healthy and ISR were good before you ran the tool? ***When I ran this tool, it was stuck with one partition and it hung there for more than a day. The Cluster performance was severely impacted, and we had to restart the entire cluster. >> I can suggest if the data/topics are more then probably you can do reassignment of subset of topics to avoid load on the cluster. You can provide a list of topics that should be moved to the new set of brokers and a target list of new brokers. ***I don't see a way even to stop the tool when its taking long time. >> You can abort the assignment by deleting the "/admin/reassign_partitions" zk node on your zookeeper cluster using zookeeper shell, and move the partitions that are assigned to the dead broker to new nodes. Thanks Sagar S

sagarshimpi · ‎09-26-2018

I tried below process and it worked - Stop AMS Moved contents of AMS "tmp.dir" to backup Moved contents of AMS "root.dir" to backup removed ams znode from zookeeper started AMS AMS is working fine now.

sagarshimpi · ‎09-04-2018

Problem Statement : We recently upgraded our AMbari and HDP to latest version. As pre-requisites while ambari upgrade we missed to upgrade ambari-infra rpm/package. We did HDP upgrade and realized as ambari-infra was not upgraded. So we upgraded ambari-infra package on respective node. When checking in Ranger UI, I am not able to see ranger audits and its giving error - 2018-09-03 12:47:06,891 [http-bio-6080-exec-18] ERROR org.apache.ranger.solr.SolrUtil (SolrUtil.java:161) - Error running solr query. Query = q=*:*&fq=evtTime:[2018-09-02T16:00:00Z+TO+NOW]&sort=evtT ime+desc&start=0&rows=25&_stateVer_=ranger_audits:542, response = null 2018-09-03 12:47:06,892 [http-bio-6080-exec-18] INFO org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:63) - Request failed. loginId=admin, logMessage=Error running solr query, please chec k solr configs. Could not find a healthy node to handle the request. javax.ws.rs.WebApplicationException Can you help to resolve this issue. Attached xa_portal.logxa-portal.txt

sagarshimpi · ‎08-24-2018

@pjoseph @Nanda Kumar pls share your views

sagarshimpi · ‎08-24-2018

Problem Statement: Few nodemanager in cluster are shutting down/getting crashed with below error - 2018-08-24 09:37:31,583 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /data07/hadoop/yarn/local/usercache/XXX/appcache/applic ation_1533656250055_31336 2018-08-24 09:37:31,583 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /data08/hadoop/yarn/local/usercache/XXX/appcache/applic ation_1533656250055_31336 2018-08-24 09:37:31,583 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /data10/hadoop/yarn/local/usercache/XXX/appcache/applic ation_1533656250055_31336 2018-08-24 09:37:31,583 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /data09/hadoop/yarn/local/usercache/XXX/appcache/applic ation_1533656250055_31336 2018-08-24 09:37:33,138 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread Thread[Container Monitor,5,main] threw an Error. Shutting down now ... java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.BufferedReader.<init>(BufferedReader.java:105) at java.io.BufferedReader.<init>(BufferedReader.java:116) at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:554) at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsBasedProcessTree.java:225) at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:445) 2018-08-24 09:37:33,145 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(542)) - Deleting path : /data01/hadoop/yarn/log/application_1533656250055_31336/container_e 92_1533656250055_31336_01_000001/directory.info Ambari Version: 2.4.2.0 HDP Version: 2.5.3.0 Analysis: From Ambari-Yarn configs I see that the Node Manager Heap is set to 1GB. I see few links which says increasing Heap to 2GB resolves the issue. Eg - http://www-01.ibm.com/support/docview.wss?uid=swg22002422 Suggestion/Help expecting: 1. Can you guide on how to debug this GC error further for RCA? Do you thing enabling GC log and Using "Jconsole" tool we can debug the jobs - why and where its using more heap/memory ? 2. How can we confirm that 1GB heap is not correct size for the cluster before I proceed it increasing to 2GB. 3. Also how can i make sure increasing to 2GB i am not going to hit GC issue again? Is there any forecasting I can do here to prevent the issue from happening in future? Please do let me know if you need any more details.

sagarshimpi · ‎08-23-2018

Nice and very useful Article @Rajkumar Singh ..

sagarshimpi · ‎03-01-2018

@Veerendra Nath Jasthi From the above error it seems the issue is with tag "versionxxxxx". tag 'version1519845495539' exists for 'capacity-scheduler'" Just update "tag" "version1519845495539" in curl command to some random number eg. version151984546666 Please retry and let me know if still any issue.

sagarshimpi · ‎03-01-2018

@Tim Veil Please refer link for working command to achieve queue addition using script / in automated way - https://community.hortonworks.com/questions/155903/how-to-add-new-yarn-queue-using-rest-api-ambari-co.html?childToView=174665#answer-174665

sagarshimpi · ‎03-01-2018

@pjoseph I was able to achieve this using ambari api updating service configs. Below is the working command - I have added Queue name - "MaxiqQueue" curl -u $ambari_user:$ambari_password -H 'X-Requested-By:admin' -X PUT "http://$ambari_server_host:8080/api/v1/clusters/$CLUSTER_NAME" -d '{ "Clusters": { "desired_config": { "type": "capacity-scheduler", "tag": "version'$date'", "properties": { "yarn.scheduler.capacity.maximum-am-resource-percent" : "0.2", "yarn.scheduler.capacity.maximum-applications" : "10000", "yarn.scheduler.capacity.node-locality-delay" : "40", "yarn.scheduler.capacity.queue-mappings-override.enable" : "false", "yarn.scheduler.capacity.resource-calculator" : "org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator", "yarn.scheduler.capacity.root.MaxiqQueue.acl_administer_queue" : "*", "yarn.scheduler.capacity.root.MaxiqQueue.acl_submit_applications" : "*", "yarn.scheduler.capacity.root.MaxiqQueue.capacity" : "90", "yarn.scheduler.capacity.root.MaxiqQueue.maximum-capacity" : "90", "yarn.scheduler.capacity.root.MaxiqQueue.minimum-user-limit-percent" : "100", "yarn.scheduler.capacity.root.MaxiqQueue.ordering-policy" : "fifo", "yarn.scheduler.capacity.root.MaxiqQueue.state" : "RUNNING", "yarn.scheduler.capacity.root.MaxiqQueue.user-limit-factor" : "1", "yarn.scheduler.capacity.root.accessible-node-labels" : "*", "yarn.scheduler.capacity.root.acl_administer_queue" : "yarn", "yarn.scheduler.capacity.root.capacity" : "100", "yarn.scheduler.capacity.root.default.acl_administer_queue" : "yarn", "yarn.scheduler.capacity.root.default.acl_submit_applications" : "yarn", "yarn.scheduler.capacity.root.default.capacity" : "10", "yarn.scheduler.capacity.root.default.maximum-capacity" : "100", "yarn.scheduler.capacity.root.default.state" : "RUNNING", "yarn.scheduler.capacity.root.default.user-limit-factor" : "1", "yarn.scheduler.capacity.root.queues" : "MaxiqQueue,default" } } } }'

Online	Offline
Last Visited	‎10-24-2024 09:03 AM

Member Since	‎02-18-2016 01:33 AM
Last Visited	‎10-24-2024 09:03 AM
Posts	141
Kudos received	19

Cloudera Community

Re: Using yarn logs command

Re: Using yarn logs command

Re: Data replication in datanode new

Re: Data replication in datanode new

Re: Mysql JDBC connection error for ambari install...

Re: yarn logs + blk_xxxxxx_xxxxxx does not exist o...

Re: Apache Kafka partition reassignment reg.

Re: Ranger Audits are not displayed while using so...

Ranger Audits are not displayed while using solr

Re: Nodemanager process crashed due to 'GC overhea...

Nodemanager process crashed due to 'GC overhead li...

Re: Understanding Kafka Consumer partition assignm...

Re: How to add new yarn queue using Rest API / Amb...

Re: API to manage YARN Capacity Queue

Re: How to add new yarn queue using Rest API / Amb...