About PabitraDas

PabitraDas · ‎02-20-2024

Hello @bkandalkar88 Thank you for sharing the details. As I mentioned in my first note, these steps needs to be tested. However, reading your update, it seems the decom ZK server is not able to communicate with other ZK Servers in ensemble. The required authentication process seems broken. So, I would suggest to add back the host (recommission the node) to CM and then to cluster and add the ZK Server role on it. Let's get back to previous operational condition. Then, file a support case with Cloudera Support to assist on the requirement if you have valid support entitlement with extended support for CDH (since CDH is EOL). Support may reproduce the issue in house and share the steps to you.

PabitraDas · ‎02-09-2024

Hello @bkandalkar88 Thank you for clarification and additional details about your plan. Technically, the idea/plan you proposed here may work. However, this needs to be tested. Since the CM managed ZK ensemble is not being used by other service and only one application, you can follow the plan as below. 1) Before decommission of ZK nodes, capture the ZK config file from ZK process directory for reference. You can tar the files in "/var/run/cloudera-scm-agent/process/**-ZOOKEEPER-***/" and keep it safe along with the backup of the ZK data directory files (myid and version-2). Update /etc/hosts file on each ZK node adding all 5 hostnames of ZK ensemble, so that each host can resolve the peer ZK hostnames without DNS. 2) Decommission and remove 2 Follower ZK nodes from cluster. ZK ensemble needs 3 or 5 nodes to maintain the quorum. You can't remove 1 node while keeping 4 nodes in ZK ensemble in a CM managed cluster. Remaining 3 ZK nodes (1 Leader and 2 Followers) would continue to run and serve the client requests (applications). 3)Once the ZK nodes are removed from CM managed cluster, stop the agent on ZK nodes to disable CM control on running ZK nodes. Verify ZK Service health from command Line and see if all client requests are working fine. 4) Verify the zoo.cfg file and data directory contents on all the 5 nodes are intact. If the ZK config is updated during decommission process, then replace them with old zoo.cfg file from backup (suggested in Step-1) 5) You need to start the ZK Service without CM on the removed nodes (followers) from CLI using script bin/zkServer.sh. Refer ZK comamnds to start service here - https://gist.github.com/miketheman/6057930 or - https://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html#sc_InstallingSingleMode comamnd - bin/zkServer.sh start </path/zoo.cfg> 6) After the ZK Service is started, ensure the connection between the ZK nodes are working fine and quorum port (4181), Leader port (3181) and client ports are in listening state and working as expected.

PabitraDas · ‎02-07-2024

Hello @KamProjectLead It seems a problem connecting your repo URL. Please check if the repo is enabled and the base URL is accessible from docker. If the network connection is working well and URL is accessible, just run "yum update all" and run the wizard again as suggested here - https://github.com/dorianbg/kerberos-on-cloudera-quickstart see similar problem addressed here - https://stackoverflow.com/questions/46601942/error-cannot-retrieve-repository-metadata-repomd-xml-for-repository-chromium - https://community.cloudera.com/t5/Support-Questions/Error-quot-Cannot-retrieve-repository-metadata-repomd-xml/m-p/133236

PabitraDas · ‎02-07-2024

Hello @bkandalkar88 Let me answer your queries one by one. 1)As I read you are looking for guidance in removing the CM managed 3 ZK nodes (not used for CM managed cluster services) to unmanaged nodes running ZK service for your external edge querying applications. - Assuming you have two ZK Services managed by CM (Cloudera Manager) and you are trying to stop the additional/second ZK Service that is only used by some edge query applications. I am not sure how the ZK service is being used by the edge application and what's being stored there. Assuming you understand the potential risks and consequences while you think of removing the ZK Service from CM managed cluster to the External ZK service. While you can decommission the nodes from CM Managed cluster after stopping the ZK Service, I would recommend consulting with Cloudera support or experienced professionals before attempting such actions. The steps for host decommissioning and removal from the cluster are described in our document here. - https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/managing-clusters/topics/cm-decommission-host.html - https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/managing-clusters/topics/cm-removing-host-from-cluster.html 2) Once the nodes are removed from the cluster, you can refer to the ZK getting started guide [1] to install the ZK Service on them which can be used by your Edge applications (you need to make changes to applications to connect the external ZK service.) [1] ZooKeeper Getting Started Guide : https://zookeeper.apache.org/doc/r3.9.1/zookeeperStarted.html

PabitraDas · ‎07-08-2023

Hello @kaps_zk Are you using Cloudera data platform or some other platform? Per our Cloudera Document, TLS v1.3 is not supported in any of our distribution. Ref: https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/installation/topics/cdpdc-networking-security-requirements.html If ZooKeeper 3.8.0 supports TLS v1.3 and all ZK clients needs to use TLS connections, you can make TLSv1.3 as default TLS protocol. You can enforce it by updating "$JAVA_HOME/jre/lib/security/java.security" file and alter "jdk.tls.disabledAlgorithms" values as suggested in below link in all ZK Servers - - https://support.bizzdesign.com/display/knowledge/Disabling+old+TLS+versions+used+by+Java+Open+JDK You need to ensure the JDK version in all the ZK Servers also supports TLSv1.3

PabitraDas · ‎10-21-2021

@DA-Ka You need to use HDFS Find tool "org.apache.solr.hadoop.HdfsFindTool" for that purpose. Refer below links which suggests some method to fid the old Files. - http://35.204.180.114/static/help/topics/search_hdfsfindtool.html However, the search-based HDFS find tool has been removed and is superseded in CDH 6 by the native "hdfs dfs -find" command, documented here: https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#find

PabitraDas · ‎10-21-2021

@PrernaU can you provide more details about this - "The objective it to share the data between tow CDP clusters." Are you trying to copy data between two distinct clusters? Are you looking at some solution like HDFS Replication task? If yes, please have a look at our Replication Manager tool in CDP for that purpose. - https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/replication-manager/topics/rm-dc-configuring-replication-of-hdfs-data.html

PabitraDas · ‎10-21-2021

@PrernaU just wanted to check if you got a chance to check our blog post on viewFS here - https://blog.cloudera.com/global-view-distributed-file-system-with-mount-points/ You may also refer the community post citing the configuration steps - https://community.cloudera.com/t5/Community-Articles/Enabling-and-configuring-the-ViewHDFS-client-side-mounts-in/ta-p/306752 If you have reviewed the above pages and still getting some issue, let us know.

PabitraDas · ‎06-17-2021

Hello @sipocootap2 The failover controller log snippet you shared here indicating the HealthMonitor thread on Active NameNode couldn't fetch the state of the local NameNode (via health check RPC) within "ha.health-monitor.rpc-timeout.ms" timeout period of 45sec (45000ms). Since there is no response within the timeout period from the local NN, the NN service entered into the "SERVICE_NOT_RESPONDING" state. NOTE: "The HealthMonitor is a thread which is responsible for monitoring the local NameNode. It operates in a simple loop, calling the monitorHealth RPC. The HealthMonitor maintains a view of the current state of the NameNode based on the responses to these RPCs. When it transitions between states, it sends a message via a callback interface to the ZKFC." The condition you cited here suggests the local NN (Active NameNode here) went unresponsive/hung or busy. Hence the local FailoverController (activeNN_zkfc) triggered a NN failover after monitorHealth RPC timed out and suggest the Standby NameNode host failover controller (SbNN_zkfc) to promote/transition local standby NN to Active State. Answers to your query Q) I have no idea why SocketTimeoutException was raised while doing doHealthChecks. Ans) Looks like Active NN was unresponsive or busy, hence the RPC call was timed out (marked with socket timeout exception) Q) "java.net.SocketTimeoutException: Call From NAMENODE/NAMENODE to NAMENODE:PORT failed on socket timeout exception: java.net.SocketTimeoutException: 45000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/NAMENODE:PORT2 remote=NAMENODE/NAMENODE:PORT]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout" log, when I look for PORT2 in the namenode, that port doesn't seem to be used. Ans) The PORT2 (local=/NAMENODE:PORT2) you see is an ephemeral port (any random port) used by HealthMonitor RPC to communicate with local NN service port 8022 ( remote=NAMENODE/NAMENODE:PORT). Since health monitor thread is local to NN means running on same node as NN, you see NN hostname appearing as both local and remote endpoint. Ref: https://community.cloudera.com/t5/Support-Questions/Namenode-failover-frequently/td-p/41122

PabitraDas · ‎04-01-2021

Hello @Amn_468, As you explained the /data mount point is used for YARN, Kudu and Impala apart from DN storage volumes. Here HDFS considers disk usage of /data/dfs/dn as HDFS/DFS used and rest all disk usage as NON-HDFS usage. If the "/data" mount point is used as YARN local directory (/data/yarn/nm), Kudu data/WAL directory (/data/kudu/*) or Impala Scratch directory (/data/impala/*) directory , then those directory usage will be considered as non-DFS Usage. In general YARN local directory or Impala Scratch directory gets empty after successful job run. In case there are files resides from a previous job run that was killed/aborted, then you need to remove those files manually to get the disk space recovered. However, Kudu space will remain intact/utilised as long as the mount point is used for Kudu Service. You can calculate the disk usage by each service and then you can calculate how much data you can recover if the YARN local directory and Impala Scratch directory data gets deleted pr removed fully. In case you are running on ext4 file system and low on available space, consider lowering the superuser block reservation from 5% to 1% (using the "tune2fs -m 1" option) on the fils system which will allow you to have some more free space on the mount point.

Online	Offline
Last Visited	‎11-21-2025 05:59 AM

Member Since	‎03-22-2017 02:53 AM
Last Visited	‎11-21-2025 05:59 AM
Posts	63
Kudos received	18

Cloudera Community

Re: TLSv1.3 Support for Zookeeper 3.8.0

Re: All Hdfs file names older than N days

Re: Reduce Non-HDFS Space

Re: HDFS Reports

Re: Cloudera management service any service not ru...

Re: Converting zookeeper nodes to Unmanaged nodes

Re: Converting zookeeper nodes to Unmanaged nodes

Re: Trying to install kerberos

Re: Converting zookeeper nodes to Unmanaged nodes

Re: TLSv1.3 Support for Zookeeper 3.8.0

Re: All Hdfs file names older than N days

Re: Configure viewFS with HDFS fedration on CDP-PC...

Re: Configure viewFS with HDFS fedration on CDP-PC...

Re: ha.HealthMonitor (HealthMonitor.java:doHealthC...

Re: Reduce Non-HDFS Space