Member since
01-19-2017
3679
Posts
632
Kudos Received
372
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1001 | 06-04-2025 11:36 PM | |
| 1568 | 03-23-2025 05:23 AM | |
| 784 | 03-17-2025 10:18 AM | |
| 2822 | 03-05-2025 01:34 PM | |
| 1862 | 03-03-2025 01:09 PM |
11-14-2019
10:09 AM
1 Kudo
@mike_bronson7 In an HA Cluster, the Standby and Active namenodes have shared storage managed by the journal node service. HA relies on a failover scenario to swap from StandBy to Active Namenode and as any other system in Hadoop it uses zookeepers. So first thing 3 Zookeepers 3 MUST be online to avoid split-brain-decision, below are the steps to follow On the Active Namenode Run the cat commande against the last-promised-epoch in the same directory as edits_inprogress_000.... # cat last-promised-epoch 31 [example output] On the standby Namenode # cat last-promised-epoch 23 [example output] From the above, you will see that the standby had a lag when the power went off. In your case, you should overwrite the one lagging on the standby after backing up as you already did hoping the Namenode has not been put back online if so do a fresh back before you proceed. SOLUTION Fix the corrupted JN's edits? Instructions to fix that one journal node. 1) Put both NN in safemode ( NN HA) $ hdfs dfsadmin -safemode enter -------output------ Safe mode is ON in Namenode1:8020 Safe mode is ON in Namenode2:8020 2) Save Namespace $ hdfs dfsadmin -saveNamespace -------output------ Save namespace successful for Namenode1:8020 Save namespace successful for Namenode2:8020 3) zip / tar the journal dir from a working JN node and copy it to the non-working JN node to failed node in the same path as the active make sure the file permissions are correct /hadoop/hdfs/journal/<cluster_name>/current 4) Restart HDFS In your case you can start only one Namenode first it will be designated automatically as the active namenode, once it up and running that is fine, the NameNode failover should now occur transparently and the below alerts should gradually disappear Stop and restart the journal nodes This will trigger the syncing of the journalnodes, If you wait for a while you should see your Namenodes up and running all "green" # su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh stop journalnode" # su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-journalnode/../hadoop/sbin/hadoop-daemon.sh start journalnode" Start the standby name node After a while, things should be in order Please let me know
... View more
11-13-2019
01:59 PM
@mike_bronson7 Guess what the copies on the working node should do ! remember files on both nodes directories are identical in an HA setup 🙂 Cheers
... View more
11-13-2019
01:23 PM
@mike_bronson7 Yeah, but once you bootstrap the zookeeper election will kick in and one will become the active namenode. It's late here I need to document the process though I have uploaded it once in HCC, I need to redact some information, I could do that tomorrow meanwhile can you backup on both the dead and working namenode the following directory zip all content and copy it to some safe directory /hadoop/hdfs/journal/<Ckuster_name>/current Please revert
... View more
11-13-2019
12:08 PM
@mike_bronson7 Are all the namenodes non-funtional? If you have one healthy namenode then there is a way out. Please revert.
... View more
11-12-2019
01:44 AM
@saivenkatg55 The example I gave purges history records created before First of April 2016 for the cluster named [PROD]. You could safely delete data from last month if you Cluster is named PROD # ambari-server db-purge-history --cluster-name [PROD] --from-date 2019-10-31
... View more
11-11-2019
12:48 PM
@mikelok Before you embark on that there are a couple of questions? Have you moved or recreated the zookeeper , YARN, and Datanode on another node? You should be having 3 zookeepers at least. and how about your YARN server? Usually, when a host crashed and stops sending the heartbeat after a period of time it's excluded from the health nodes. Procedure 1. Decommission DataNodes: From the node hosting the NameNode, edit the $HADOOP_CONF_DIR/dfs. exclude file by adding the list of DataNode hostnames, separated by a newline character. Update the NameNode with the new set of excluded DataNodes. Run the following command from the NameNode machine: # su $HDFS_USER
$ hdfs dfsadmin -refreshNodes $HDFS_USER is the user that owns the HDFS services, which is usually hdfs. Open the NameNode web interface, and go to the DataNodes page: http://<abc.my_namenode.com>:50070. Verify that the state is changed to Decommission in Progress for the DataNodes that are being decommissioned. Shut down the decommissioned nodes when all of the DataNodes are decommissioned. All of the blocks will already be replicated. If you use a dfs.include file on your cluster, remove the decommissioned nodes from that file on the NameNode host machine. Then refresh the nodes on that machine: # su $HDFS_USER
$ hdfs dfsadmin -refreshNodes If no dfs.include is used, all DataNodes are considered included in the cluster, unless a node exists in a $HADOOP_CONF_DIR/dfs.exclude file. You can also use the Ambari REST API to achieve that here is a reference https://cwiki.apache.org/confluence/display/AMBARI/Using+APIs+to+delete+a+service+or+all+host+components+on+a+host Hope that helps
... View more
11-11-2019
12:22 PM
@Rak You have a couple of errors in your sqoop syntax but you are almost there, please have a look at the below hints and retry after understanding and correcting them. 1. Sqoop import--connect This is wrong you need a space between the import and -- ie sqoop import --connect 2. 'jdbc:sqlserver'--username is also not correct you need a port number and databases name ie "jdbc:sqlserver://<Server_Host>:<Server_Port>;databaseName" 3. The quotes around the '2019-11-08'" is wrong too 3. All you -- should have a space before sqoop import--connect 'jdbc:sqlserver'--username 'sa' -P--query "select * from dlyprice where $CONDITIONS AND `date`= '2019-11-08'"--split-by `date--target-dir /home/hduser2 -m 2 Try something like this ! sqoop import --connect --connect "jdbc:sqlserver://<Server_Host>:<Server_Port>;DB_Name>" \ --driver com.microsoft.sqlserver.jdbc.SQLServerDriver \ --username XXXX -P --query "select * from dlyprice where $CONDITIONS AND `date`= '2019-11-08' \ --split-by `date` --target-dir /home/hduser2 -m 2 The above isn't tested by I wanted to highlight some of your mistakes by making it work makes you a better hadooper ! Here is a link to Sqoop User Guide Can you try out this syntax remember to replace the values with that of your environment sqoop import --connect "jdbc:sqlserver://hostname;username='sa';password='sa_password';database=yourDB" --driver com.microsoft.sqlserver.jdbc.SQLServerDriver --query "select * from dlyprice where \$CONDITIONS" --split-by date -m 2 --target-dir /home/hduser2 Please revert
... View more
11-11-2019
11:46 AM
@svasi Sorry that nothing is working out for you, my guess is the GCP platform, Have you seen this link using bdutil? Installing HDP on GCP I am wondering what documentation your are following can you share the link I have some free credit I could try that out this weekend
... View more
11-10-2019
11:47 PM
@wimster @Zeba Surely thats great but during the webinar niether Lakshmi Randall, Wim Stoop nor Matthew could commit to Cloudera's release date hence the date before end of year 🙂 something understandable incase the release date is missed that won't be a good image. As an insider I am sure you have more information can you share the link to that release info I would like to test drive it this week.
... View more
11-08-2019
11:02 AM
@Zeba I was privileged to attend the Accelerate your time-to-insight with CDP Data Center Cloudera webinar last month by Lakshmi Randall, Wim Stoop, Matthew Schumpert where unfortunately they confirmed that CDP is only available on AWS as for now the Azure release should be coming before the end of the year maybe as a Christmas gift 🙂 So you won't find any download repo as yet but if you are a Cloudera customer then contact your Cloudera Sales rep.
... View more