Member since
02-18-2016
141
Posts
19
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5124 | 12-18-2019 07:44 PM | |
5154 | 12-15-2019 07:40 PM | |
1817 | 12-03-2019 06:29 AM | |
1836 | 12-02-2019 06:47 AM | |
5849 | 11-28-2019 02:06 AM |
11-12-2019
01:04 AM
Hi Mike, Can you do quick check below - **BP-484874736-172.2.45.23-8478399929292:blk_1081495827_7755233 does not exist or is not under Construction >> 1. Are all Datanodes up and running fine within cluster 2. Check on the NN UI and see if any Datanode is NOT reporting blocks in Datanode tab or any Missing blocks reported on NN UI 3. You can run fsck [unless cluster is huge and loaded with data] and check of the block exist and which all nodes has the replica. It might help to drill down the issue.
... View more
11-11-2019
11:56 PM
Hi Vinay, Do you see any error in logs while running "reassign partition tool" ? This might help to debug issue. Were all the brokers healthy and ISR were good before you ran the tool? ***When I ran this tool, it was stuck with one partition and it hung there for more than a day. The Cluster performance was severely impacted, and we had to restart the entire cluster. >> I can suggest if the data/topics are more then probably you can do reassignment of subset of topics to avoid load on the cluster. You can provide a list of topics that should be moved to the new set of brokers and a target list of new brokers. ***I don't see a way even to stop the tool when its taking long time. >> You can abort the assignment by deleting the "/admin/reassign_partitions" zk node on your zookeeper cluster using zookeeper shell, and move the partitions that are assigned to the dead broker to new nodes. Thanks Sagar S
... View more
09-26-2018
04:21 PM
I tried below process and it worked - Stop AMS Moved contents of AMS "tmp.dir" to backup Moved contents of AMS "root.dir" to backup removed ams znode from zookeeper started AMS AMS is working fine now.
... View more
09-04-2018
03:24 PM
Problem Statement : We recently upgraded our AMbari and HDP to latest version. As pre-requisites while ambari upgrade we missed to upgrade ambari-infra rpm/package. We did HDP upgrade and realized as ambari-infra was not upgraded. So we upgraded ambari-infra package on respective node. When checking in Ranger UI, I am not able to see ranger audits and its giving error - 2018-09-03 12:47:06,891 [http-bio-6080-exec-18] ERROR org.apache.ranger.solr.SolrUtil (SolrUtil.java:161) - Error running solr query. Query = q=*:*&fq=evtTime:[2018-09-02T16:00:00Z+TO+NOW]&sort=evtT
ime+desc&start=0&rows=25&_stateVer_=ranger_audits:542, response = null
2018-09-03 12:47:06,892 [http-bio-6080-exec-18] INFO org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:63) - Request failed. loginId=admin, logMessage=Error running solr query, please chec
k solr configs. Could not find a healthy node to handle the request.
javax.ws.rs.WebApplicationException Can you help to resolve this issue. Attached xa_portal.logxa-portal.txt
... View more
Labels:
- Labels:
-
Apache Ranger
-
Apache Solr
08-24-2018
01:22 PM
@pjoseph @Nanda Kumar pls share your views
... View more
08-24-2018
09:49 AM
Problem Statement: Few nodemanager in cluster are shutting down/getting crashed with below error - 2018-08-24 09:37:31,583 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /data07/hadoop/yarn/local/usercache/XXX/appcache/applic
ation_1533656250055_31336
2018-08-24 09:37:31,583 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /data08/hadoop/yarn/local/usercache/XXX/appcache/applic
ation_1533656250055_31336
2018-08-24 09:37:31,583 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /data10/hadoop/yarn/local/usercache/XXX/appcache/applic
ation_1533656250055_31336
2018-08-24 09:37:31,583 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(537)) - Deleting absolute path : /data09/hadoop/yarn/local/usercache/XXX/appcache/applic
ation_1533656250055_31336
2018-08-24 09:37:33,138 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread Thread[Container Monitor,5,main] threw an Error. Shutting down now
...
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.io.BufferedReader.<init>(BufferedReader.java:105)
at java.io.BufferedReader.<init>(BufferedReader.java:116)
at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:554)
at org.apache.hadoop.yarn.util.ProcfsBasedProcessTree.updateProcessTree(ProcfsBasedProcessTree.java:225)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:445)
2018-08-24 09:37:33,145 INFO nodemanager.LinuxContainerExecutor (LinuxContainerExecutor.java:deleteAsUser(542)) - Deleting path : /data01/hadoop/yarn/log/application_1533656250055_31336/container_e
92_1533656250055_31336_01_000001/directory.info Ambari Version:
2.4.2.0 HDP Version: 2.5.3.0 Analysis: From Ambari-Yarn configs I see that the Node Manager Heap is set to 1GB. I see few links which says increasing Heap to 2GB resolves the issue. Eg - http://www-01.ibm.com/support/docview.wss?uid=swg22002422 Suggestion/Help expecting: 1. Can you guide on how to debug this GC error further for RCA? Do you thing enabling GC log and Using "Jconsole" tool we can debug the jobs - why and where its using more heap/memory ? 2. How can we confirm that 1GB heap is not correct size for the cluster before I proceed it increasing to 2GB. 3. Also how can i make sure increasing to 2GB i am not going to hit GC issue again? Is there any forecasting I can do here to prevent the issue from happening in future? Please do let me know if you need any more details.
... View more
Labels:
- Labels:
-
Apache YARN
08-23-2018
12:56 PM
Nice and very useful Article @Rajkumar Singh ..
... View more
03-01-2018
08:28 PM
1 Kudo
@Veerendra Nath Jasthi From the above error it seems the issue is with tag "versionxxxxx". tag 'version1519845495539' exists for 'capacity-scheduler'" Just update "tag" "version1519845495539" in curl command to some random number eg. version151984546666 Please retry and let me know if still any issue.
... View more
03-01-2018
06:25 AM
@Tim Veil Please refer link for working command to achieve queue addition using script / in automated way - https://community.hortonworks.com/questions/155903/how-to-add-new-yarn-queue-using-rest-api-ambari-co.html?childToView=174665#answer-174665
... View more
03-01-2018
06:22 AM
@pjoseph I was able to achieve this using ambari api updating service configs. Below is the working command - I have added Queue name - "MaxiqQueue" curl -u $ambari_user:$ambari_password -H 'X-Requested-By:admin' -X PUT "http://$ambari_server_host:8080/api/v1/clusters/$CLUSTER_NAME" -d '{
"Clusters": {
"desired_config": {
"type": "capacity-scheduler",
"tag": "version'$date'",
"properties": {
"yarn.scheduler.capacity.maximum-am-resource-percent" : "0.2",
"yarn.scheduler.capacity.maximum-applications" : "10000",
"yarn.scheduler.capacity.node-locality-delay" : "40",
"yarn.scheduler.capacity.queue-mappings-override.enable" : "false",
"yarn.scheduler.capacity.resource-calculator" : "org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator",
"yarn.scheduler.capacity.root.MaxiqQueue.acl_administer_queue" : "*",
"yarn.scheduler.capacity.root.MaxiqQueue.acl_submit_applications" : "*",
"yarn.scheduler.capacity.root.MaxiqQueue.capacity" : "90",
"yarn.scheduler.capacity.root.MaxiqQueue.maximum-capacity" : "90",
"yarn.scheduler.capacity.root.MaxiqQueue.minimum-user-limit-percent" : "100",
"yarn.scheduler.capacity.root.MaxiqQueue.ordering-policy" : "fifo",
"yarn.scheduler.capacity.root.MaxiqQueue.state" : "RUNNING",
"yarn.scheduler.capacity.root.MaxiqQueue.user-limit-factor" : "1",
"yarn.scheduler.capacity.root.accessible-node-labels" : "*",
"yarn.scheduler.capacity.root.acl_administer_queue" : "yarn",
"yarn.scheduler.capacity.root.capacity" : "100",
"yarn.scheduler.capacity.root.default.acl_administer_queue" : "yarn",
"yarn.scheduler.capacity.root.default.acl_submit_applications" : "yarn",
"yarn.scheduler.capacity.root.default.capacity" : "10",
"yarn.scheduler.capacity.root.default.maximum-capacity" : "100",
"yarn.scheduler.capacity.root.default.state" : "RUNNING",
"yarn.scheduler.capacity.root.default.user-limit-factor" : "1",
"yarn.scheduler.capacity.root.queues" : "MaxiqQueue,default"
}
}
}
}'
... View more