Member since
12-07-2018
29
Posts
0
Kudos Received
0
Solutions
01-01-2020
11:51 AM
@pra_big hbase user is the admin user of hbase one connects to a running instance of HBase using the hbase shell command, located in the bin/ directory of your HBase install. Here the version information that is printed when you start HBase Shell has been omitted. The HBase Shell prompt ends with a > character. As hbase user $ ./bin/hbase shell hbase(main):001:0> All the below methods will give you access to the HBase shell as the admin user [hbase] If you have root access # su - hbase It will give you the same above If you have sudo privileges # sudo su hbase -l I don't see the reason for changing to bash or didn't I understand your question well?
... View more
12-27-2019
10:24 AM
Kindly correct if wrong 1) If Active NameNode1 crashes, then after ha.health-monitor.rpc-timeout.ms seconds, NameNode2 will try to become Active. No, Every 2000ms zookeeper "ticktime" setting to do heartbeats and the minimum session timeout will be twice the tickTime. The new entry, initLimit is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader. The entry syncLimit limits how far out of date a server can be from a leader. Failover controller running on namenode1 and namenode2 check/does the health monitoing and ZKFC property for monitorHealth RPC timeouts are set by parameter ha.health-monitor.rpc-timeout.ms(Timeout for the actual monitorHealth() calls) ha.health-monitor.rpc-timeout.ms parameter means Timeout for the actual monitorHealth() calls. Kindly remember this setting is for timeout of health monitor calls 2) If active node crashes, then after dfs.ha.fencing.ssh.connect-timeout seconds NameNode2 will try to become Active. Answer - The above statement is incorrect dfs.ha.fencing.ssh.connect-timeout is only applicable when dfs.ha.fencing.methods is selected or mentioned as "sshfence" But in cloudera default dfs.ha.fencing.methods is mentioned as shell(true) The transition from the active namenode to the standby is managed by a new entity in the system called the failover controller. Failover controllers are pluggable, but the first implementation uses ZooKeeper to ensure that only one namenode is active. Each namenode runs a lightweight failover controller process whose job it is to monitor its namenode for failures (using a simple heartbeating mechanism) and trigger a failover on namenode failure. Failover may also be initiated manually by an administrator, in the case of routine maintenance, for example. This is known as a graceful failover, since the failover controller arranges an orderly transition for both namenodes to switch roles. In the case of an ungraceful failover, however, it is impossible to be sure that the failed namenode has stopped running. For example, a slow network or a network partition can trigger a failover transition, even though the previously active namenode is still running, and thinks it is still the active namenode. The HA implementation goes to great lengths to ensure that the previously active namenode is prevented from doing any damage and causing corruption—a method known as fencing. The system employs a range of fencing mechanisms, including killing the namenode’s process, revoking its access to the shared storage directory (typically by using a vendor-specific NFS command), and disabling its network port via a remote management command. As a last resort, the previously active namenode can be fenced with a technique rather graphi- cally known as STONITH, or “shoot the other node in the head”, which uses a specialized power distribution unit to forcibly power down the host machine. Client failover is handled transparently by the client library. The simplest implementation uses client-side configuration to control failover. The HDFS URI uses a logical hostname which is mapped to a pair of namenode addresses (in the configuration file), and the client library tries each namenode address until the operation succeeds. Let's take an example - I have configured Hadoop HA cluster. If I kill Namenode process with command "kill -9 NameNodeProcessId" my standby node changes its state to active. But if I power off active node then standby node can't change its state to active because it trys to connect to the crashed node by using SSH. This parameter doesn't work: dfs.ha.fencing.ssh.connect-timeout 3000 It is 5 second by default. But even after 5 minutes standby node continue try to connect to crashed node. I set it manually for 3 second but it still doesn't work. So, if we just kill namenode process our cluster works but if we crash active node our cluster become unavailable. Since you powered off the Active NN machine, during fail-over SNN(Standby Namenode) timed out to connect to this machine and fencing is failed. Typically fencing methods should be configured to not to allow multiple writers to same shared storage. It looks like you are using 'QJM' and it supports the fencing feature on its own. i.e. it wont allow multiple writers at a time. So I think external fencing methods can be skipped. AFAIK, to improve the availability of the system in the event the fencing mechanisms fail, it is advisable to configure a fencing method which is guaranteed to return success. You can remove the SSH fencing method from both machines configuration. Please try the below shell based fence method just to skip SSH fence and restart the cluster. Then fail over will happen successfully. <property> <name>dfs.ha.fencing.methods</name> <value>shell(/bin/true)</value> </property>
... View more
11-22-2019
04:44 AM
I have the same set of questions 1. How do I take znode back ups? is there a way ? 2. rmr /hbase-secure from zkcli and restarting hbase services , should essentially built me all the znode tree structure back. is my assumption right?
... View more
10-16-2019
08:18 AM
1 Kudo
Hello, Oracle Data Integrator connects to Hive by using JDBC and uses Hive and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs. Source - HERE The points mentioned by you from the documentation is for the purpose of Blocking the external applications and non service users from accessing the Hive metastore. Since, ODI connects to Hive using JDBC, it should connect to HiveServer2 as described in this documentation. Once connected, the query executed from ODI will connect with HiveServer2. Then, HiveServer2 will connect with HiveMetastore for getting the metadata details of the table against which you are querying and proceed with the execution. It is not necessary for ODI to connect to Hive MetaStore directly. For details about Hive Metastore HA, please read HERE
... View more
04-21-2019
11:42 AM
Hi, When the NameNode flushes the edits to Journal Nodes it maintains the quorum of 20 seconds. The reason you are seeing this Error message is because it took >20 sec for NN to send the edits. This could be because of various reasons i.e NN GC or JVM pause, whether JN is sharing the disks with other roles, network communication issues , slow group lookups etc. Checking the NameNode logs just before the FATAL message would be a good starting point. Check for Warning messages just before the FATAL error message on NameNode logs.
... View more
04-09-2019
05:54 AM
1 Kudo
You can try to increase the weight of the DRP3, so it will get more priority and jobs submitted to this pool will get more resources than other pools based on the weight configured.
... View more
04-05-2019
05:55 AM
Please attempt to start CM server again @sbommaraju Then look up /var/log/cloudera-scm-server/cloudera-scm-server.log as this should indicate potential reason for the startup failure. Also we suggest to start your own thread in the forums if you need further help with this startup error
... View more
03-07-2019
01:03 AM
Thank you very much Harsh
... View more
02-13-2019
06:27 AM
Below issue can happen if certifcate is expired? I see in some logs that certificates are expired. Please send documentation for certification renewal. 2019-02-13 23:31:58,038 WARN 1168879507@agentServer-54778:org.mortbay.log: javax.net.ssl.SSLException: Received fatal alert: certificate_expired 2019-02-13 23:31:58,703 WARN 1168879507@agentServer-54778:org.mortbay.log: javax.net.ssl.SSLException: Received fatal alert: certificate_expired 2019-02-13 23:32:01,494 INFO 1645307921@scm-web-99151:com.cloudera.server.web.cmf.AuthenticationSuccessEventListener: Authentication success for user: 'admin' from 192.168.10.51 2019-02-13 23:32:03,490 WARN 1168879507@agentServer-54778:org.mortbay.log: javax.net.ssl.SSLException: Received fatal alert: certificate_expired
... View more