About sandipkumar

sandipkumar · ‎12-13-2022

I'm using cdp-pvc base cluster on aws. I have distributed a properties file(abc.properties) in /opt/ on all the nodes of the cluster. I've added the HBASE_CLASSPATH in safety valve, where I provided the path for properties file. I've created a jar for hbase operations. When I try to read the properties file with the following code. inputStream = CLASS.class.getClassLoader().getResourceAsStream("abc.properties"); The Hmaster is not getting healthy and I get inputstream as null in the strerr file. Please let me know, if anything else is required to make it available in classpath.

sandipkumar · ‎08-11-2022

I have been trying to create a custom ambari alert. I have created dedicated alert.json and its correspondent python script. I have successfully registered the alert function, but my python script is failing here. Is there any specific format which is required in python script. I have created a python script with execute function but still I get the error which says, 'module' object has no attribute 'execute'. following is the python script, from subprocess import check_output def execute(configurations={}, parameters={}, host_name=None): try: pid = check_output(["pidof","servicename"]) if pid > 0 return "OK" except: return "CRITICAL" if __name__ == '__main__': execute() This is the screenshot of the alert. Please suggest if you think there is something wrong with this, I have tried making changes in python script, But I keep getting the same error.

sandipkumar · ‎02-27-2022

Do you know how can I access hs_err_pid1.log. I tried looking for it in compute nodes and in some service nodes. I did not found them in /tmp directory and if they are present in some containers, how to locate which containers to access?

sandipkumar · ‎02-23-2022

I have a default database catalog in Cloudera Data Warehouse(CDW) for a environment and I have created a virtual warehouse with that database catalog. I am trying to use a newly created UDF with this virtual warehouse. But whenever I execute that UDF I get the following error. When I looked over at virtual warehouse there comes a alert symbol, on hovering which I get a message saying "Hive server service is not ready. Service endpoint may not be reachable!" I collected the diagnostic bundle and tried to search for the issue. In hiveserver log I found the issue of java core dump. A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0000000000093c9e, pid=1, tid=0x00007f7bf5bc2700 # # JRE version: OpenJDK Runtime Environment (8.0_312-b07) (build 1.8.0_312-b07) # Java VM: OpenJDK 64-Bit Server VM (25.312-b07 mixed mode linux-amd64 compressed oops) # Problematic frame: # C 0x0000000000093c9e # # Core dump written. Default location: /usr/lib/core or core.1 # # An error report file with more information is saved as: # /tmp/hs_err_pid1.log So my question is what could be the issue here causing core dump or can there be possibly different reason for hiveserver not starting. And is there a way I can get the core dump present at /tmp/hs_err_pid1.log location to analyze?

sandipkumar · ‎09-01-2021

I have installed CDSW in my cluster and created a project. But while launching the session I see the following error. I'm adding the details of 'cdsw status' here, [root@edge ~]# cdsw status Sending detailed logs to [/tmp/cdsw_status_mrI87T.log] ... CDSW Version: [1.9.1.10118148:81531c5] Installed into namespace 'default' OK: Application running as root check OK: NFS service check OK: System process check for CSD install OK: Sysctl params check OK: Kernel memory slabs check ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | NAME | STATUS | CREATED-AT | VERSION | EXTERNAL-IP | OS-IMAGE | KERNEL-VERSION | GPU | STATEFUL | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | edge.localdomain.com | True | 2021-09-01 13:42:11+00:00 | v1.16.15 | None | Red Hat Enterprise Linux Server 7.9 (Maipo) | 3.10.0-1160.11.1.el7.x86_64 | 0 | True | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1/1 nodes are ready. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | NAME | READY | STATUS | RESTARTS | CREATED-AT | POD-IP | HOST-IP | ROLE | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | etcd-edge.localdomain.com | 1/1 | Running | 0 | 2021-09-01 13:43:07+00:00 | 2.10.1.74 | 2.10.1.74 | None | | kube-apiserver-edge.localdomain.com | 1/1 | Running | 0 | 2021-09-01 13:43:05+00:00 | 2.10.1.74 | 2.10.1.74 | None | | kube-controller-manager-edge.localdomain.com | 1/1 | Running | 1 | 2021-09-01 13:43:14+00:00 | 2.10.1.74 | 2.10.1.74 | None | | kube-dns-78c484fc8c-zqdkh | 3/3 | Running | 0 | 2021-09-01 13:42:32+00:00 | 100.66.0.1 | 2.10.1.74 | None | | kube-proxy-jggmt | 1/1 | Running | 0 | 2021-09-01 13:43:38+00:00 | 2.10.1.74 | 2.10.1.74 | None | | kube-scheduler-edge.localdomain.com | 1/1 | Running | 2 | 2021-09-01 13:42:14+00:00 | 2.10.1.74 | 2.10.1.74 | None | | weave-net-9v6h7 | 2/2 | Running | 0 | 2021-09-01 13:43:38+00:00 | 2.10.1.74 | 2.10.1.74 | None | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- All required pods are ready in cluster kube-system. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | NAME | READY | STATUS | RESTARTS | CREATED-AT | POD-IP | HOST-IP | ROLE | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | archiver-f965d8cd7-8f4ld | 1/1 | Running | 0 | 2021-09-01 13:42:31+00:00 | 100.66.0.34 | 2.10.1.74 | archiver | | cdsw-compute-pod-evaluator-849b98f9fd-nlwtc | 1/1 | Running | 0 | 2021-09-01 13:42:33+00:00 | 100.66.0.26 | 2.10.1.74 | None | | cron-6c69457cc8-rx4xn | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.10 | 2.10.1.74 | cron | | db-69fcf7b6fc-dhmnm | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.16 | 2.10.1.74 | db | | db-migrate-81531c5-2rnrb | 0/1 | Succeeded | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.9 | 2.10.1.74 | db-migrate | | ds-cdh-client-646754d4fc-wgcr7 | 1/1 | Running | 0 | 2021-09-01 13:42:31+00:00 | 100.66.0.27 | 2.10.1.74 | ds-cdh-client | | ds-operator-7d66bcc8b6-db5cz | 1/1 | Running | 0 | 2021-09-01 13:42:33+00:00 | 100.66.0.9 | 2.10.1.74 | ds-operator | | ds-reconciler-5b94488d96-9nslq | 1/1 | Running | 0 | 2021-09-01 13:42:34+00:00 | 100.66.0.30 | 2.10.1.74 | ds-reconciler | | ds-vfs-f5868f5f4-2t4n8 | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.23 | 2.10.1.74 | ds-vfs | | feature-flags-6bf8d694d9-tzpvs | 1/1 | Running | 0 | 2021-09-01 13:42:32+00:00 | 100.66.0.8 | 2.10.1.74 | feature-flags | | grafana-cml-dashboards-81531c5-74v5v | 0/1 | Succeeded | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.3 | 2.10.1.74 | None | | grafana-core-855774dbb5-np6nw | 1/1 | Running | 0 | 2021-09-01 13:42:33+00:00 | 100.66.0.17 | 2.10.1.74 | None | | image-puller-6hznx | 1/1 | Running | 1 | 2021-09-01 13:42:31+00:00 | 100.66.0.25 | 2.10.1.74 | image-puller | | ingress-controller-7c68cf9557-znvc8 | 1/1 | Running | 0 | 2021-09-01 13:42:31+00:00 | 100.66.0.7 | 2.10.1.74 | ingress-controller | | kube-state-metrics-df5f5677f-mv5zh | 1/1 | Running | 0 | 2021-09-01 13:42:31+00:00 | 100.66.0.4 | 2.10.1.74 | None | | livelog-5bc9f6c9c-qjpfq | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.15 | 2.10.1.74 | livelog | | livelog-cleaner-1630540800-t57fk | 0/1 | Succeeded | 0 | 2021-09-02 04:18:53+00:00 | 100.66.0.36 | 2.10.1.74 | None | | livelog-publisher-5vhd4 | 1/1 | Running | 2 | 2021-09-01 13:42:31+00:00 | 100.66.0.5 | 2.10.1.74 | None | | model-proxy-5d7d546fff-wpn8k | 1/1 | Running | 0 | 2021-09-01 13:42:34+00:00 | 100.66.0.6 | 2.10.1.74 | model-proxy | | prometheus-core-86c8fdfc5b-6kbth | 1/1 | Running | 0 | 2021-09-01 13:42:32+00:00 | 100.66.0.11 | 2.10.1.74 | None | | prometheus-node-exporter-kcttt | 1/1 | Running | 0 | 2021-09-01 13:42:31+00:00 | 2.10.1.74 | 2.10.1.74 | None | | runtime-repo-puller-7ff4d5d8fb-fdw4g | 1/1 | Running | 0 | 2021-09-01 13:42:34+00:00 | 100.66.0.19 | 2.10.1.74 | runtime-repo-puller | | s2i-builder-5b45456b6c-c4dkh | 1/1 | Running | 0 | 2021-09-01 13:42:31+00:00 | 100.66.0.29 | 2.10.1.74 | s2i-builder | | s2i-builder-5b45456b6c-tgpgb | 1/1 | Running | 0 | 2021-09-01 13:42:31+00:00 | 100.66.0.28 | 2.10.1.74 | s2i-builder | | s2i-builder-5b45456b6c-z8jqm | 1/1 | Running | 0 | 2021-09-01 13:42:31+00:00 | 100.66.0.24 | 2.10.1.74 | s2i-builder | | s2i-client-7fc996df48-nd5wp | 1/1 | Running | 0 | 2021-09-01 13:42:32+00:00 | 100.66.0.22 | 2.10.1.74 | s2i-client | | s2i-git-server-545d845cbd-grv2x | 1/1 | Running | 0 | 2021-09-01 13:42:31+00:00 | 100.66.0.13 | 2.10.1.74 | s2i-git-server | | s2i-queue-59f86fd95c-c9nb9 | 1/1 | Running | 0 | 2021-09-01 13:42:33+00:00 | 100.66.0.14 | 2.10.1.74 | s2i-queue | | s2i-registry-7fcd8fb64f-8948z | 1/1 | Running | 0 | 2021-09-01 13:42:32+00:00 | 100.66.0.20 | 2.10.1.74 | s2i-registry | | s2i-registry-auth-5db74dc5c6-dssst | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.18 | 2.10.1.74 | s2i-registry-auth | | s2i-server-66f6cc5df7-sjz9x | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.3 | 2.10.1.74 | s2i-server | | secret-generator-687d4c8965-bd6qk | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.12 | 2.10.1.74 | secret-generator | | spark-port-forwarder-zmjrr | 1/1 | Running | 0 | 2021-09-01 13:42:31+00:00 | 2.10.1.74 | 2.10.1.74 | spark-port-forwarder | | tcp-ingress-controller-75bfb55546-tgc27 | 1/1 | Running | 0 | 2021-09-01 13:42:34+00:00 | 100.66.0.33 | 2.10.1.74 | tcp-ingress-controller | | usage-reporter-7567fff596-vtrp9 | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.21 | 2.10.1.74 | usage-reporter | | web-857b6596d6-sslrd | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.31 | 2.10.1.74 | web | | web-857b6596d6-t9dhc | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.35 | 2.10.1.74 | web | | web-857b6596d6-tjvdc | 1/1 | Running | 0 | 2021-09-01 13:42:30+00:00 | 100.66.0.32 | 2.10.1.74 | web | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- All required pods are ready in cluster default. All required Application services are configured. All required secrets are available. Persistent volumes are ready. Persistent volume claims are ready. Ingresses are ready. Checking web at url: http://cdsw.localdomain.com OK: HTTP port check Cloudera Data Science Workbench is ready! Is it some issue with the live-log cleaner pod or something else?

sandipkumar · ‎03-23-2021

Edit: I've learned that decommissioning nodes might not be the preferred way. otherwise, the problem statement remains the same. i.e. The job should not get scheduled on nodes where a particular service is down.

sandipkumar · ‎03-16-2021

Hi Shelton, thanks for the reply. I'll try to reframe my question here a bit. I want to decommission nodes from certain worker nodes based on a criteria and not disturb the service as a whole. And should be able to recommiossion on those nodes again. FYI, The external healthcheck script which I've used in case of yarn-based services(e.g. hive) does not stop hive metastore. The external healthcheck script is distributed across nodes. Which is executed by yarn periodically and when it fails on certain nodes, that node is marked as unhealthy. and job no more gets scheduled there and if after a period of time that node becomes healthy job can be scheduled there. I've added this example so that you can relate better with the use case in question.

sandipkumar · ‎03-15-2021

I have a use case that if a particular service is down on a node, no job should get scheduled there anymore. so for hive or spark, I can use the concept of external health script present in Hadoop. Which periodically runs the script and if service is down that script will mark node unhealthy and job won't get scheduled there anymore. But this uses Yarn. Impala doesn't use yarn. I tried finding the alternative for impala, but I couldn't find anything like a custom script. What could be the possible ways to tackle this scenario? Does impala have a custom health check script? Reference: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeManager.html

Online	Offline
Last Visited	‎12-22-2022 04:34 AM

Member Since	‎12-29-2020 02:05 AM
Last Visited	‎12-22-2022 04:34 AM
Posts	10

Cloudera Community

unable to read file present in hbase classpath

HDP Ambari alert python script format

Re: Hive server service is not ready. Service endp...

Hive server service is not ready. Service endpoint...

Websocket connection failed to the Livelog server.

Re: Custom health check in impala

Re: Custom health check in impala

Custom health check in impala