Member since
01-19-2017
3620
Posts
599
Kudos Received
360
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1080 | 04-06-2023 12:49 PM | |
510 | 10-26-2022 12:35 PM | |
1032 | 09-27-2022 12:49 PM | |
1186 | 05-27-2022 12:02 AM | |
1025 | 05-26-2022 12:07 AM |
05-22-2019
08:41 PM
@Sami Ahmad When you have set up your ambari.repo correctly on Linux you need to do the following # yum repolist
# yum install -y ambari-server
# yum install -y mysql-connector-java
# ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java That should pick the correct version of MySQL driver for your ambari if you indeed to run on MySQL or MariaDB # yum install -y mariadb-server To get the mysql-connect version here are the steps # zipgrep 'Bundle-Version' mysql-connector-java.jar output META-INF/MANIFEST.MF:Bundle-Version: 5.1.25 HTH
... View more
05-22-2019
08:16 PM
@ajay vembu Here is the cause of the problem Caused by: java.net.BindException: Address already in use This means some process is using the default Node manager port. You will need to kill the process and your name node will start successfully do the following things sudo lsof -i -P -n | grep LISTEN or sudo netstat -ltup It was a simple port collision that has occurred. Can you first stop the Ambari Metrics process HMaster process at times it grabs port 45454 and this blocks the NodeManager from starting? Please do that and revert
... View more
05-22-2019
07:30 PM
1 Kudo
@Shesh Kumar Read carefully the bold text of the Quorum Journal nodes, it states why the standby namenode reads the edits from the journal node and does not read the block reports from the Active Namenode so it has nothing to do with reading the block reports from the JN but the FSImage and edits files Namenode The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. The inodes and the list of blocks that define the metadata of the name system are called the image. NameNode keeps the entire namespace image in RAM. The persistent record of the image stored in the NameNode's local native filesystem is called a checkpoint. The NameNode records changes to HDFS in a write-ahead log called the journal in its local native filesystem. The location of block replicas is not part of the persistent checkpoint. Each client-initiated transaction is recorded in the journal, and the journal file is flushed and synced before the acknowledgment is sent to the client. The checkpoint file is never changed by the NameNode; a new file is written when a checkpoint is created during a restart when requested by the administrator, or by the CheckpointNode described in the next section. During startup, the NameNode initializes the namespace image from the checkpoint and then replays changes from the journal. A new checkpoint and an empty journal are written back to the storage directories before the NameNode starts serving clients. For improved durability, redundant copies of the checkpoint and journal are typically stored on multiple independent local volumes and at remote NFS servers. The first choice prevents loss from a single volume failure, and the second choice protects against the failure of the entire node. If the NameNode encounters an error writing the journal to one of the storage directories it automatically excludes that directory from the list of storage directories. The NameNode automatically shuts itself down if no storage directory is available. The NameNode is a multithreaded system and processes requests simultaneously from multiple clients. Saving a transaction to disk becomes a bottleneck since all other threads need to wait until the synchronous flush-and-sync procedure initiated by one of them is complete. In order to optimize this process, the NameNode batches multiple transactions. When one of the NameNode's threads initiates a flush-and-sync operation, all the transactions batched at that time are committed together. Remaining threads only need to check that their transactions have been saved and do not need to initiate a flush-and-sync operation. DataNodes During startup, each DataNode connects to the NameNode and performs a handshake. The purpose of the handshake is to verify the namespace ID and the software version of the DataNode. If either does not match that of the NameNode, the DataNode automatically shuts down. The namespace ID is assigned to the filesystem instance when it is formatted. The namespace ID is persistently stored on all nodes of the cluster. Nodes with a different namespace ID will not be able to join the cluster, thus protecting the integrity of the filesystem. A DataNode that is newly initialized and without any namespace ID is permitted to join the cluster and receive the cluster's namespace ID. After the handshake, the DataNode registers with the NameNode. DataNodes persistently store their unique storage IDs. The storage ID is an internal identifier of the DataNode, which makes it recognizable even if it is restarted with a different IP address or port. The storage ID is assigned to the DataNode when it registers with the NameNode for the first time and never changes after that. A DataNode identifies block replicas in its possession to the NameNode by sending a block report. A block report contains the block ID, the generation stamp and the length for each block replica the server hosts. The first block report is sent immediately after the DataNode registration. Subsequent block reports are sent every hour and provide the NameNode with an up-to-date view of where block replicas are located on the cluster. During normal operation, DataNodes send heartbeats to the NameNode to confirm that the DataNode is operating and the block replicas it hosts are available. The default heartbeat interval is three seconds. If the NameNode does not receive a heartbeat from a DataNode in ten minutes the NameNode considers the DataNode to be out of service and the block replicas hosted by that DataNode to be unavailable. The NameNode then schedules the creation of new replicas of those blocks on other DataNodes. Heartbeats from a DataNode also carry information about total storage capacity, the fraction of storage in use, and the number of data transfers currently in progress. These statistics are used for the NameNode's block allocation and load balancing decisions. The NameNode does not directly send requests to DataNodes. It uses replies to heartbeats to send instructions to the DataNodes. The instructions include commands to replicate blocks to other nodes, remove local block replicas, re-register and send an immediate block report, and shut down the node. High Availability The HDFS NameNode High Availability feature enables you to run redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This eliminates the NameNode as a potential single point of failure (SPOF) in an HDFS cluster. Standby Namenode It does three things: Merging fsimage and edits-log files. Receive online updates(Journal nodes) of the file system meta-data, apply them to its memory state and persist them on disks just like the name-node does. Performs checkpoints of the namespace state Thus at any time, the standby namenode contains an up-to-date image of the namespace both in memory and on local disk(s). The cluster will switch over to the new name-node (this standby-node) if the active namenode dies Quorum Journal Nodes QJN is HDFS implementation that provides edit logs. It permits to share these edit logs between the active and standby NameNode. Standby Namenode communicates and synchronizes with the active NameNode for high availability. It will happen by a group of daemons called “Journal nodes”. The Quorum Journal Nodes runs as a group of journal nodes. At least three journal nodes should be there. For N journal nodes, the system can tolerate at most (N-1)/2 failures. The system thus continues to work. So, for three journal nodes, the system can tolerate the failure of one {(3-1)/2} of them. Whenever an active node performs any modification, it logs modification to all journal nodes. The standby node reads the edits from the journal nodes and applies to its own Namespace in a constant manner. In the case of failover, the standby will ensure that it has read all the edits from the journal nodes before promoting itself to the Active state. This ensures that the namespace state is completely synchronized before a failure occurs. To provide a fast failover, the standby node must have up-to-date information about the location of data blocks in the cluster. For this to happen, IP address of both the NameNode is available to all the data nodes and they send block location information and heartbeats to both NameNode. HTH
... View more
05-22-2019
12:33 PM
@Akash S If you have kerberized your cluster using AD, your local user cannot generate a valid Kerberos key unless he/she is present in the AD. The reason for using AD is to delegate and centralize user creation/authentication/management to Active Directory. You should maybe configure a System Security Services Daemon (SSSD) client to use Active Directory (AD) as an Identity Provider for SSSD But the best solution is to create your HIVEUSER in AD which will generate the correct keytabs/permission for your user to access hive. HTH
... View more
05-22-2019
08:51 AM
@Mazen Elshayeb Unbelievable ping me on linkedin ,could help with remote
... View more
05-21-2019
08:09 PM
@Farhana Khan Which is the active node? Can you check again the last epoch on all the 3 nodes? Are you still experiencing the same problem? Please revert so I can analyze your problem again?
... View more
05-21-2019
12:39 PM
@Farhana Khan Any updates?
... View more
05-20-2019
04:31 PM
1 Kudo
@Farhana Khan How do I fix one corrupted JN's edits? Instructions to fix that one journal node. 1) Put both NN in safe mode from the active name node( NN HA) $ hdfs dfsadmin -safemode enter 2) Save Namespace $ hdfs dfsadmin -saveNamespace Backup the edits_* in $ /hadoop/hdfs/journal/{cluster_name}/current/ on the node02 and node04 take note of the file permissions (screenshot would be important) On node 2 # cd /hadoop/hdfs/journal/{cluster_name}/current/
# tar -czvf node2.tar.gz *
# rm -rf * On node 4 # cd /hadoop/hdfs/journal/{cluster_name}/current/
# tar -czvf node4.tar.gz *
# rm -rf * On the good node03 After zipping/tar the journal dir from a working JN node and copy it to the non-working JN node02,node04 to this path on node3 # cd /hadoop/hdfs/journal/{cluster_name}/current/
# tar -czvf node03.tar.gz * Hoping you have the root password for the cluster From node03 in the /hadoop/hdfs/journal/{cluster_name}/current/ directory run the below command to copy the good edits_* to node02 and node04 # scp node03.tar.gz root@node02:/hadoop/hdfs/journal/{cluster_name}/current/
# scp node03.tar.gz root@node04:/hadoop/hdfs/journal/{cluster_name}/current/ Having copied the zipped edit_* files open 3 windows and connect as root to node02 and node04 and run the below steps On node02 # cd /hadoop/hdfs/journal/{cluster_name}/current/
# tar -xzvf node03.tar.gz On node04 # cd /hadoop/hdfs/journal/{cluster_name}/current/
# tar -xzvf node03.tar.gz Check the file permissions are okay Stop the journal nodes on node02 and node04 Open 2 windows and run the below on node02 and node04 # su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh stop journalnode" Restarting the journalnodes Open 2 windows and run the below on node02 and node04 # su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh start journalnode" All these commands should run successfully To validate after all start well you can restart all HDFS components 4) Restart HDFS Your issue should now be resolved !!! Please revert HTH
... View more
05-18-2019
08:20 AM
@Farhana Khan Perfect that's an issue I resolved sometime already so let me document the process for you. Can you share the screenshot of the path to your edits_000000 files? $ /hadoop/hdfs/journal/{cluster_name}/current/ On the 3 journalnodes count the number of files in each journalnode get the count of the healthy journalnode. On all the journalnode in edits_000 directory run $ cat last_promised-epoch After I get the above output I will show you the steps
... View more