Member since
02-08-2016
793
Posts
669
Kudos Received
85
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3141 | 06-30-2017 05:30 PM | |
| 4099 | 06-30-2017 02:57 PM | |
| 3404 | 05-30-2017 07:00 AM | |
| 3982 | 01-20-2017 10:18 AM | |
| 8628 | 01-11-2017 02:11 PM |
11-16-2016
11:33 AM
7 Kudos
ISSUE: Hive view is not working. ERROR: H100 Unable to submit statement show databases like '*': org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
ROOT CAUSE: Issue was with mysql pool connection size limit exceeded.Check using - mysql> SHOW VARIABLES LIKE "max_connections";
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 100 |
+-----------------+-------+
1 row in set (0.00 sec)
RESOLUTION: Modified mysql pool size limit from 100 to 500 and restarted mysql which resolved the issue. mysql> SET GLOBAL max_connections = 500;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE "max_connections";
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 500 |
+-----------------+-------+
1 row in set (0.00 sec)
... View more
Labels:
11-16-2016
11:33 AM
7 Kudos
SYMPTOM: Standby NN crashing due to edit log corruption and complaining that OP_CLOSE cannot be applied because the file is not under-construction
ERROR: 2016-09-30T06:23:25.126-0400 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/appdata/148973_perfengp/TARGET/092016/tempdb.TARGET.092016.hdfs, replication=3, mtime=1475223680193, atime=1472804384143, blockSize=134217728, blocks=[blk_1243879398_198862467], permissions=gsspe:148973_psdbpe:rwxrwxr-x, aclEntries=null, clientName=, clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, txid=1585682886]
java.io.IOException: File is not under construction: /appdata/148973_perfengp/TARGET/092016/tempdb.TARGET.092016.hdfs
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:436)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:679)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:536)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:595)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
ROOT CAUSE: Edit log corruption can happen if append fails with a quota violation. This is BUG
https://issues.apache.org/jira/browse/HDFS-7587
https://hortonworks.jira.com/browse/BUG-56811
https://hortonworks.jira.com/browse/EAR-1248
RESOLUTION: 1. Stop everything
2. Backup the "current" folder of every journalnodes of the cluster
3. Backup the "current" folder of every namenodes of the cluster
4. Use the oev command to convert the binary editlog file into xml
5. Remove the record corresponding to the TXID mentioned in the error
6. Use the oev command to convert the xml editlog file into binary
7. Restart the active namenode
8. I got an error saying there was a gap in the editlogs
9. Take the keytab for the service nn/<host>@<REALM>
10. Execute the command hadoop namenode -recover
11. Answer "c" when the problem of gap occured
12. Then I saw other errors similar to the one I encountered at the beginning (the file not under construction issue)
13. I had to run the command hadoop namenode recover twice in order to get rid of these errors
14. Zookeeper servers were already started, so I started the journalnodes, the datanodes, the zkfc controllers and finally the active namenode
15. Some datanodes were identified as dead. After some investigations, I figured it was the information in zookeeper which were empty, so I restarted zookeeper servers and after the active namenode was there.
15. I started the standby namenode but it raised the same errors concerning the gap in the editlogs.
16. Being the user hdfs, I executed on the standby namenode the command hadoop namenode -bootstrapStandby -force
17. The new FSimage was good and identical to the one on the active namenode
18. I started the standby namenode successfully
19. I launched the rest of the cluster
Also check recovery option given in link - Namenode-Recovery
... View more
Labels:
11-16-2016
09:31 AM
Please try clearing the cache and retry. Check if you see any error in /var/log/ambari-server/ambari-server.logs
... View more
11-16-2016
09:29 AM
@yankai wang What command you use for password reset ? was it "ambari-admin-password-reset" ?
... View more
11-16-2016
04:53 AM
Please check if postgresql is exceeded number of pool connections. Increasing the value from 100 to 500 for pool connection resolved the issue. Login to mysql using superuser and check pool connections as below - mysql> SHOW VARIABLES LIKE "max_connections"; Modify pool connections - mysql> SET GLOBAL max_connections =500;
... View more
11-15-2016
06:44 PM
2 Kudos
@Kate Shaw It normally takes 30secs to refresh policies. You can check the "Plugin" option in ranger UI to check if the policy is getting sync or not.There is no option to force policies. I see there is option in each service plugin in Ambari to define time interval. Example for HDFS service is shown below In Ambari UI->HDFS->Services->Configs->"Advance ranger-hdfs-security" you can change the poll interval here[refresh time]. Check few link which can help to understand better - https://community.hortonworks.com/questions/13070/ranger-policy-is-not-applied.html
... View more
11-15-2016
05:52 PM
@Anindya Chattopadhyay In the above image just put the value infront of "Location" as /root/devph/labs/Lab3.3 and enter. This will take you to the path.
... View more
11-15-2016
05:29 PM
1 Kudo
@Zeeshan Ahmed You can install HDP 2.3.2 which comes with Apache Spark 1.4.1 Here are the release notes - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_HDP_RelNotes/content/ch_relnotes_v232.html Ambari version - Ambari-2.1.2.1 Here is ambari repo - https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Installing_HDP_AMB/content/_download_the_ambari_repo_lnx6.html wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.1.2.1/ambari.repo -O /etc/yum.repos.d/ambari.repo $cat ambari.repo
#VERSION_NUMBER=2.1.2.1-418
[Updates-ambari-2.1.2.1]
name=ambari-2.1.2.1 - Updates
baseurl=http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.1.2.1
gpgcheck=1
gpgkey=http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins
enabled=1
priority=1
... View more
11-15-2016
02:09 PM
6 Kudos
ISSUE: While performing unkerberizing cluster all services were down and nothing was coming up. Also the unkeberized cluster step failed. The start of services was failed. Tried to manually start Namenodes which came up but the status was not displayed correctly in Ambari UI. The journal node were not able to start and was failing with error as shown below. ERROR: Screenshot is attached below Journal node error: ROOT CAUSE: There were multiple issue as below - 1. From the JN error it says "missing spnego keytab". From the error It seems the kerberos was not properly disabled on cluster. 2. As checked in hdfs-site.xml the property "hadoop.http.authentication.type" was set to kerberos. 3. Oozie was not able to detect active namenode, since the property "hadoop.http.authentication.simple.anonymous.allowed" was set to false. RESOLUTION: 1. Setting hadoop.http.authentication.type to simple in hdfs-site.xml, HDFS was able to restart 2. Setting the property hadoop.http.authentication.simple.anonymous.allowed=true in hdfs-site.xml oozie was able to detect active namenode and also namenode status was corrrectly displayed in namenode UI.
... View more
Labels: