About senthh

RangaReddy · ‎04-17-2023

Hi @saivenkatg55 Could you please check where datalakedev host name is configured in your hadoop/hive configuration files? and also check you are able to ping the datalakedev hostname where you are running spark-sql command.

Shelton · ‎01-01-2020

@saivenkatg55 You didn't respond to this answer, do you still need help or it was resolved if so please do accept and close the thread.

senthh · ‎12-28-2019

Hello, You try 'spark.scheduler.mode' as 'FAIR' conf.set("spark.scheduler.mode", "FAIR") so that multiple job will be executed in parallel. Please refer document [1] [1] https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

Shelton · ‎12-28-2019

@sheelstera There is a great YARN tuning spreadsheet here that will help you calculate correctly your YARN settings. It applies to YARN clusters only, and describes how to tune and optimize YARN for your cluster Please revert

pra_big · ‎12-27-2019

Kindly correct if wrong 1) If Active NameNode1 crashes, then after ha.health-monitor.rpc-timeout.ms seconds, NameNode2 will try to become Active. No, Every 2000ms zookeeper "ticktime" setting to do heartbeats and the minimum session timeout will be twice the tickTime. The new entry, initLimit is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader. The entry syncLimit limits how far out of date a server can be from a leader. Failover controller running on namenode1 and namenode2 check/does the health monitoing and ZKFC property for monitorHealth RPC timeouts are set by parameter ha.health-monitor.rpc-timeout.ms(Timeout for the actual monitorHealth() calls) ha.health-monitor.rpc-timeout.ms parameter means Timeout for the actual monitorHealth() calls. Kindly remember this setting is for timeout of health monitor calls 2) If active node crashes, then after dfs.ha.fencing.ssh.connect-timeout seconds NameNode2 will try to become Active. Answer - The above statement is incorrect dfs.ha.fencing.ssh.connect-timeout is only applicable when dfs.ha.fencing.methods is selected or mentioned as "sshfence" But in cloudera default dfs.ha.fencing.methods is mentioned as shell(true) The transition from the active namenode to the standby is managed by a new entity in the system called the failover controller. Failover controllers are pluggable, but the first implementation uses ZooKeeper to ensure that only one namenode is active. Each namenode runs a lightweight failover controller process whose job it is to monitor its namenode for failures (using a simple heartbeating mechanism) and trigger a failover on namenode failure. Failover may also be initiated manually by an administrator, in the case of routine maintenance, for example. This is known as a graceful failover, since the failover controller arranges an orderly transition for both namenodes to switch roles. In the case of an ungraceful failover, however, it is impossible to be sure that the failed namenode has stopped running. For example, a slow network or a network partition can trigger a failover transition, even though the previously active namenode is still running, and thinks it is still the active namenode. The HA implementation goes to great lengths to ensure that the previously active namenode is prevented from doing any damage and causing corruption—a method known as fencing. The system employs a range of fencing mechanisms, including killing the namenode’s process, revoking its access to the shared storage directory (typically by using a vendor-specific NFS command), and disabling its network port via a remote management command. As a last resort, the previously active namenode can be fenced with a technique rather graphi- cally known as STONITH, or “shoot the other node in the head”, which uses a specialized power distribution unit to forcibly power down the host machine. Client failover is handled transparently by the client library. The simplest implementation uses client-side configuration to control failover. The HDFS URI uses a logical hostname which is mapped to a pair of namenode addresses (in the configuration file), and the client library tries each namenode address until the operation succeeds. Let's take an example - I have configured Hadoop HA cluster. If I kill Namenode process with command "kill -9 NameNodeProcessId" my standby node changes its state to active. But if I power off active node then standby node can't change its state to active because it trys to connect to the crashed node by using SSH. This parameter doesn't work: dfs.ha.fencing.ssh.connect-timeout 3000 It is 5 second by default. But even after 5 minutes standby node continue try to connect to crashed node. I set it manually for 3 second but it still doesn't work. So, if we just kill namenode process our cluster works but if we crash active node our cluster become unavailable. Since you powered off the Active NN machine, during fail-over SNN(Standby Namenode) timed out to connect to this machine and fencing is failed. Typically fencing methods should be configured to not to allow multiple writers to same shared storage. It looks like you are using 'QJM' and it supports the fencing feature on its own. i.e. it wont allow multiple writers at a time. So I think external fencing methods can be skipped. AFAIK, to improve the availability of the system in the event the fencing mechanisms fail, it is advisable to configure a fencing method which is guaranteed to return success. You can remove the SSH fencing method from both machines configuration. Please try the below shell based fence method just to skip SSH fence and restart the cluster. Then fail over will happen successfully. <property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property>

senthh · ‎12-27-2019

Hi Upgrading Hive version in CDH may lead to getting support features from Cloudera. You can try upgrading CDH 6.3.2 but still it will have hive 2.1.1. But you will a lot of hive issue fixes in CDH 6.3.2. You can refer similar question thread[1] [1] https://community.cloudera.com/t5/Support-Questions/Upgrading-CDH-to-use-Hive-1-2-0-or-higher/td-p/62936

ssk26 · ‎12-26-2019

Hi All, The issue has got fixed. It was due to Spark Executor JVM Option being set incorrectly. Thanks and Regards, Sudhindra

senthh · ‎12-25-2019

Hi, ``You can configure proxy user using properties hadoop.proxyuser.$superuser.hosts along with either or both of hadoop.proxyuser.$superuser.groups and hadoop.proxyuser.$superuser.users.`` Refer: [1] https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html

senthh · ‎12-17-2019

Please check if userid 'solr' is the member of "supergroup"; If not add solr into supergroup.

senthh · ‎04-25-2019

This is a Hive Metastore health test that checks that a client can connect and perform basic operations. The operations include: (1) creating a database, (2) creating a table within that database with several types of columns and two partition keys, (3) creating a number of partitions, and (4) dropping both the table and the database. The database is created under the /user/hue/.cloudera_manager_hive_metastore_canary/<Hive Metastore role name>/ and is named "cloudera_manager_metastore_canary_test_db". The test returns "Bad" health if any of these operations fail. The test returns "Concerning" health if an unknown failure happens. The canary publishes a metric 'canary_duration' for the time it took for the canary to complete. Here is an example of a trigger, defined for the Hive Metastore role configuration group, that changes the health to "Bad" when the duration of the canary is longer than 5 sec: "IF (SELECT canary_duration WHERE entityName=$ROLENAME AND category = ROLE and last(canary_duration) > 5s) DO health:bad" A failure of this health test may indicate that the Hive Metastore is failing basic operations. Check the logs of the Hive Metastore and the Cloudera Manager Service Monitor for more details. This test can be enabled or disabled using the Hive Metastore Canary Health Test Hive Metastore monitoring setting. Ref: https://www.cloudera.com/documentation/enterprise/5-7-x/topics/cm_ht_hive_metastore_server.html#concept_p03_hon_yk

Online	Offline
Last Visited	‎09-07-2021 07:59 AM

Member Since	‎11-21-2018 10:45 PM
Last Visited	‎09-07-2021 07:59 AM
Posts	33
Kudos received	3

Cloudera Community

Re: How to upgrade Hive in CDH?

Re: Unable to create core test_collection_shard1_r...

Re: pyspark / pyarrow problem

Re: Unable to access the hive table from spark-sql

Re: Unable to create the notebook in zeppelin .

Re: read multiple table parallel using Spark

Re: YARN application stuck in ACCEPTED state DESPI...

Re: 1) How can I view the history of switching mod...

Re: How to upgrade Hive in CDH?

Re: Unable to read Kafka topic messages

Re: hadoop.proxyuser.root.groups config setting. W...

Re: Unable to create core test_collection_shard1_r...

Re: Understanding cloudera charts