About PabitraDas

PabitraDas · ‎11-09-2020

Hello @Masood, I believe you are asking the commands to run in order to determine the active NN apart from CM UI ( CM > HDFS > Instance > NameNode) From CLI you have to run couple of commands to detemrine the Active/Standby NN List the namenode hostnames # hdfs getconf -namenodes c2301-node2.coelab.cloudera.com c2301-node3.coelab.cloudera.com Get nameservice name # hdfs getconf -confKey dfs.nameservices nameservice1 Get active and standby namenodes # hdfs getconf -confKey dfs.ha.namenodes.nameservice1 namenode11,namenode20 # su - hdfs $ hdfs haadmin -getServiceState namenode11 active $ hdfs haadmin -getServiceState namenode20 standby Get active and standby namenode hostnames $ hdfs getconf -confKey dfs.namenode.rpc-address.nameservice1.namenode11 c2301-node2.coelab.cloudera.com:8020 $ hdfs getconf -confKey dfs.namenode.rpc-address.nameservice1.namenode20 c2301-node3.coelab.cloudera.com:8020 If you want to get the active namenode hostname from hdfs-site.xml file, you can go through following python script in github – https://github.com/grakala/getActiveNN. Thank you

PabitraDas · ‎11-09-2020

Hello @sace17 It seems your problem is related to credential cache. Per "https://bugzilla.redhat.com/show_bug.cgi?id=1029110", If the keyring ccache is changed from UID to username like below, it is not possible to get ticket as non-root user. default_ccache_name = KEYRING:persistent:%{username} We have a KB article talks about the problem - https://community.cloudera.com/t5/board/article/ta-p/74262 Per KB article, CDH/Hadoop components do not fully support the advanced Linux feature KEYRING to store Keberos credentials. Remove any global profile setting for environment variable KRB5CCNAME. If no type prefix is present, the FILE type is assumed, which is supported by CDH/Hadoop components. Please remove/comment the section in /etc/krb5.conf file of all cluster nodes and that should solve your problem. Ref community post on the same problem here - https://community.cloudera.com/t5/Support-Questions/Kerberos-Cache-in-IPA-RedHat-IDM-KEYRING-SOLVED/td-p/108373 Additional Reference: - https://web.mit.edu/kerberos/krb5-1.12/doc/basic/ccache_def.html Thank you

PabitraDas · ‎11-09-2020

Hello @AlexP Ref: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#setrep Referring to HDFS document, answers to your questions are inline. [Q1.] How to estimate how much time would this command take for a single directory (without -w)? [A1.] It depends upon the numbr of files in the directory. If you are running setrep against a path which is a directory, then the command recursively changes the replication factor of all files under the directory tree rooted at path. The time varies dependsing on the file count under the path/directory. [Q2.] Will it trigger a replication job even if I don't use the '-w' flag? [A2.] Yes, replication will trigger without -w flag. However, it is good practice to use -w to ensure all files are having required replication factor set prior to command exits. Please note, the -w flag requests that the command wait for the replication to complete. Though use of -w potentially takes a long time to complete the command but it gurantees the replication factor changed to the specified value. [Q3.] If yes, does it mean that the NameNode will actually start deleting 'over-replicated' blocks of all existing files under a particular directory? [A3.] Yes, your understanding is correct. The additonal 1 replica of the block will mark the block as over-replicated and same will be deleted from cluster. This action will be performed for each files under the directory path keeping only 2 replicas of the file blocks. Hope this helps.

PabitraDas · ‎11-09-2020

Hello @Amn_468 The DN Pause alert you see for 1/9 DataNodes are indication of growing blocks on it. Compared to other DNs, possibly this DN in question have stored more number of blocks than other nodes. You may compare the block counts of each DN in HDFS > HDFS > WebUI > Active NN Web UI > DataNodes > Cehck the blocks column under section "In Operation". The log snippet you shared indicates a pause of 2sec only, which is not sign of worry. However, with proper JVM heap size allocated for DN, you may avoid these frequent pause alerts. As a thumb rule you may need 1GB heap for 1Million blocks and since you have 6GB allocated for DN heap, please verify the block counts on the DNs and ensure they are not too high (> 6Millions) in count which may explain why there are so many pause alerts. In case the block count is too high than expected, it means you need to increase the heap size to accomodate the block objects in JVM heap memeory. On a. side note, growing block counts also an early warning/indication of small files problem in cluster. You need to be vigilant about that. Verify the average block size and that would help you to understand, if you are having small files problem in your cluster. Regards, Pabitra Das

PabitraDas · ‎09-30-2020

Hello @vincentD Please review the stdout and stderr of the DN which going down frequently. You can navigate to CM > HDFS > Instance > Select the DN which went down > Processes > click on stdout/stderr atthe bottom of the page. I am asking to verify stdout/stderr suspecting an OOM error (due to java heap running out of memory) leading to the DN exit/shutdown abruptly. If the DN exit is due to OOM Error, please increase the DN heap size to adequate value to get rid off teh issue further. DN heap sizing rule of thumb says: 1 GB heap memory for 1Million blocks. You can verify your block counts on each DN by navigating to CM > HDFS > NN Web UI > Active NN > DataNode and you can see the DN stats on that page showing block counts and disk usage etc..

PabitraDas · ‎09-30-2020

Hello @tuk you can read about Apache HBase (hbase-operator-tool) HBCK2 Tool here - https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2 This page suggests how to download the hbase-operator-tool which contains the "hbase-hbck2" jar used to fix the inconsistencies in CDH 6.x (hbase 2.x). hbase-hbck2-1.0.0-SNAPSHOT.jar See the section talks about fixing problems using hbck2 - https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2#fixing-problems

PabitraDas · ‎09-30-2020

Hello @tuk , you can find the hbck2 tool for CDH 6.3.3 from Cloudera via a support ticket. Else you can obtain the hbck2 tool available under the HBase distribution directory. See the HBASE Downloads Page. - https://downloads.apache.org/hbase/hbase-operator-tools-1.0.0/hbase-operator-tools-1.0.0-bin.tar.gz - http://apachemirror.wuchna.com/hbase/hbase-operator-tools-1.0.0/hbase-operator-tools-1.0.0-bin.tar.gz - https://mirrors.estointernet.in/apache/hbase/hbase-operator-tools-1.0.0/hbase-operator-tools-1.0.0-bin.tar.gz By default, running bin/hbase hbck, the built-in hbck1 tooling will be run. To run HBCK2, you need to point at a built HBCK2 jar using the -j option as in: $ ${HBASE_HOME}/bin/hbase --config /etc/hbase-conf hbck -j ~/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar wherein the above, /etc/hbase-conf is where the hbase deployment's configuration files resides. The HBCK2 jar is at ~/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar. The above command with no options or arguments passed will dump out the HBCK2 help: usage: HBCK2 [OPTIONS] COMMAND <ARGS> Options: -d,--debug run with debug output -h,--help output this help message -p,--hbase.zookeeper.property.clientPort <arg> port of hbase ensemble -q,--hbase.zookeeper.quorum <arg> hbase ensemble -s,--skip skip hbase version check (PleaseHoldException) -v,--version this hbck2 version -z,--zookeeper.znode.parent <arg> parent znode of hbase ensemble Command: addFsRegionsMissingInMeta <NAMESPACE|NAMESPACE:TABLENAME>... Options:

PabitraDas · ‎09-29-2020

Hello @marianaduran, Not sure what you are looking for exactly. If you want to use HBase REST interface, please refer Cloudera blog page which talks about using the Apache HBase REST Interface here - https://blog.cloudera.com/how-to-use-the-apache-hbase-rest-interface-part-1/ - https://clouderatemp.wpengine.com/blog/2013/04/how-to-use-the-apache-hbase-rest-interface-part-2/ And Cloudera doc page on Configuring and Using the HBase REST API - https://docs.cloudera.com/documentation/enterprise/5-13-x/topics/admin_hbase_rest_api.html -

PabitraDas · ‎09-29-2020

Hello @JB0000000000001 , Your assessment of ADLS support in CDH6.x is correct. The doc page is no more available in CDH 6.x because ADLS Gen1 or ADLS Gen2 storage can't be used as an HBase Root directory in CDH 6.x. This is a limitation in CDH 6.x You can use ADLS with Hive, Impala, Oozie, Spark, YARN, MapReduce, Sqoop and DistCp but not with HBase [1]. Per the Cloudera 6.x document [2], ADLS is not supported as the default filesystem. So do not set the default file system property (fs.defaultFS) to an adl:// URI in HDFS or as a root directory (hbase.rootdir) in HBase (adl://<adls_account_name>.azuredatalakestore.net/<hbase_directory>) . You can still use ADLS as a secondary filesystem while HDFS remains the primary filesystem [2]. [1] https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_adls_docs_ref.html [2] https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_adls2_config.html#abfs_gen2_limitations

PabitraDas · ‎09-29-2020

Hello @hammer75, currently no document suggests the use of BYOK as a backing Keystore. Cloudera offers the following two options for enterprise-grade key management: Cloudera Navigator Key Trustee Server is a key store for managing encryption keys. To integrate with the Navigator Key Trustee Server, Cloudera provides a custom KMS service, Key Trustee KMS. Hardware security modules (HSM) are third-party appliances that provide the highest level of security for keys. To integrate with a list of supported HSMs, Cloudera provides a custom KMS service, Navigator HSM KMS (see Installing Navigator HSM KMS Backed by Thales HSM and Installing Navigator HSM KMS Backed by Luna HSM). Ref: https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_hdfs_encryption.html#concept_hsm_kms_solution So HDFS Data At Rest Encryption wizard in Cloudera Manager offers below 4 roots of trust for encryption keys: Cloudera Navigator Key Trustee Server Navigator HSM KMS backed by Thales HSM Navigator HSM KMS backed by Luna HSM A file-based password-protected Java KeyStore (not for Prod env)

Online	Offline
Last Visited	‎12-23-2024 08:29 AM

Member Since	‎03-22-2017 02:53 AM
Last Visited	‎12-23-2024 08:29 AM
Posts	63
Kudos received	18

Cloudera Community

Re: TLSv1.3 Support for Zookeeper 3.8.0

Re: All Hdfs file names older than N days

Re: Reduce Non-HDFS Space

Re: HDFS Reports

Re: Cloudera management service any service not ru...

Re: How to See which NameNode is Active?

Re: Unable to access Hadoop CLI after enabling Ker...

Re: Changing HDFS replication factor on existing f...

Re: Data Node Pause Duration

Re: DataNode daemon restarted frequently

Re: How to get hbck2 tool for CDH 6.3.2?

Re: How to get hbck2 tool for CDH 6.3.2?

Re: how to use the Apache HBase REST Interface

Re: Can we still configure in cdh 6.1 hbase to wor...

Re: BYOK (Bring Your Own Key)