About rki_

rki_ · ‎12-29-2022

@mabilgen you have 143 million blocks in the cluster and the NN heap is 95GB. This is why the NN is not holding up. You would need to bring the total block count to 90 million for NN to start working properly as the NN expects atleast 150GB of heap for 143 million blocks to work smoothly.

rki_ · ‎12-28-2022

HI @mabilgen , The main problem that you have on this cluster is lack of RAM on the host which is limited to 128GB. The Namenodes while startup will consume the allocated heap of 98GB that leaves 30GB of memory for any other processes running on this host. When other processes also utilise this remaining 30GB of memory, you are seeing huge JVM pauses as the garbage collector is trying to de-reference the objects to free up the memory but this takes too much time that the Namenode gives up on the Journalnode Quorum and fails over. As a Thumb rule, you should allocate 1GB of heap for 1 million blocks. So if there are more than 98million blocks on this cluster, then the current NN heap is not sufficient. 1) Try to lower the Total block count on the cluster by deleting any unwanted files or old snapshots. 2) If feasible, add more Physical RAM to the host Any amount of tunings won't be helpful in this situation as the jvm pauses are too big to be managed by tunings. As such, you would need to either perform cleanup of HDFS or Add more RAM to NN hosts or Move Namenode to another Node which has higher RAM.

rki_ · ‎12-28-2022

HI @Sadique1 There is no specific option to decommission a disk from Datanode. The best option would be to decommission the Datanode followed by removing the disk and recommission it back. https://community.cloudera.com/t5/Community-Articles/Decommission-and-Reconfigure-Data-Node-Disks/ta-p/248262 The downside of decommission and recommission is that it will need to copy all the blocks to another host for all disks, not just the one you are removing. OR Considering that you don't have files with rep factor=1(fsck will tell you if any such files exist.), You could remove the disk from 1 DN and restart it, and wait for the NN to recover the under replicated blocks. Then repeat for each node. Note : If there are files with rep factor as 1 on those Datanode disks, then you will be getting missing blocks. Was your question answered? Please take some time to click on “Accept as Solution” below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.

rki_ · ‎12-27-2022

Hi @younes You will need to use the CM api to get this done. Check if the below helps : https://community.cloudera.com/t5/Support-Questions/hadoop-amp-amp-eco-system-cmdline-start-stop/m-p/359732#M238177

rki_ · ‎12-16-2022

Hi @ditto You can use API to start the services. To Stop and Start a CDH service, you could try to make use of the CM APIs such as the below, Start the service - POST /clusters/{clusterName}/services/{serviceName}/commands/start Stop the service - POST /clusters/{clusterName}/services/{serviceName}/commands/stop https://cloudera.github.io/cm_api/apidocs/v6/path__clusters_-clusterName-_commands_stop.html https://cloudera.github.io/cm_api/apidocs/v6/path__clusters_-clusterName-_commands_start.html https://cloudera.github.io/cm_api/apidocs/v6/index.html

rki_ · ‎12-07-2022

Hi @jgabrey-1216863216 , This has been fixed in CDP 7.1.7 SP1 CHF20 (p1063). You can refer the below doc : https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/runtime-release-notes/topics/chf-pvcb-sp1-overview.html#ariaid-title2

rki_ · ‎12-06-2022

Hi @hanumanth , As you can see, the spark job is trying to reach the zookeeper on the localhost. 22/12/05 22:30:12 WARN zookeeper.ZKUtil: hconnection-0x5e29988e0x0, quorum=localhost:2181 We expect a zookeeper quorum of 3 or more ZK under quorum=. So this indicates that the node on which the spark job is running doesn't have a hbase-site.xml to direct the job to use the hbase.zookeeper.quorum. So make sure you have a Hbase Gateway role deployed on the node from where you are running the spark job and also try running the job if spark-submit using for eg "--files /etc/spark/conf/yarn-conf/hbase-site.xml"

rki_ · ‎11-23-2022

HI @bgkim, Hbase won't work well with data folders of hbase set to EC policy. EC did not implement hflush which is required to not lose data. Additionally, the Data locality is also killed with EC which makes it slower. https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/cdppvc-data-migration-opdb/topics/cdppvc-data-migration-hbase-unsupported-interfaces-features.html -- Was your question answered? Please take some time to click on “Accept as Solution” below this post. If you find a reply useful, say thanks by clicking on the thumbs up button.

rki_ · ‎11-23-2022

Hi @bhushan04ec041 , Can you try the below and check : 1) Stop the Oozie service 2) Select the "Delete" option from drop down menu. Under "Remove service Dependencies", select "Configure Service Dependency" and uncheck "oozie". 3) Save Changes 4) Delete Oozie

rki_ · ‎11-17-2022

@mike_bronson7 -- Was your question answered? Please take some time to click on “Accept as Solution” below this post.

Online	Offline
Last Visited	‎11-15-2025 04:28 AM

Member Since	‎07-30-2020 02:04 AM
Last Visited	‎11-15-2025 04:28 AM
Posts	219
Kudos received	44

Cloudera Community

Re: Restore data from datanode after doing hdfs na...

Re: HBase "Master is initializing" error in pseudo...

Re: After upgrading Cloudera Manager to 7.11.3, Li...

Re: Can HDFS Rebalancer run without interrupted Pr...

Re: CM-HDFS

Re: HA - Name Nodes don't get started

Re: HA - Name Nodes don't get started

Re: Decommission disk from HDFS (Ambari Cluster)

Re: Restart Hbase by Command line

Re: hadoop & eco system cmdline start, stop

Re: CVE-2022-42889 Apache Commons Text Text4Shell

Re: Hbase - Zookeeper connection issue

Re: Compatibility of HBase with HDFS - Erasure Co...

Re: Want to remove oozie from cluster

Re: JVM + still have Detected pause in JVM in spit...