About sshimpi

sshimpi · ‎11-16-2016

ISSUE: Hive view is not working. ERROR: H100 Unable to submit statement show databases like '*': org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out ROOT CAUSE: Issue was with mysql pool connection size limit exceeded.Check using - mysql> SHOW VARIABLES LIKE "max_connections"; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | max_connections | 100 | +-----------------+-------+ 1 row in set (0.00 sec) RESOLUTION: Modified mysql pool size limit from 100 to 500 and restarted mysql which resolved the issue. mysql> SET GLOBAL max_connections = 500; Query OK, 0 rows affected (0.00 sec) mysql> SHOW VARIABLES LIKE "max_connections"; +-----------------+-------+ | Variable_name | Value | +-----------------+-------+ | max_connections | 500 | +-----------------+-------+ 1 row in set (0.00 sec)

sshimpi · ‎11-16-2016

SYMPTOM: Standby NN crashing due to edit log corruption and complaining that OP_CLOSE cannot be applied because the file is not under-construction ERROR: 2016-09-30T06:23:25.126-0400 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/appdata/148973_perfengp/TARGET/092016/tempdb.TARGET.092016.hdfs, replication=3, mtime=1475223680193, atime=1472804384143, blockSize=134217728, blocks=[blk_1243879398_198862467], permissions=gsspe:148973_psdbpe:rwxrwxr-x, aclEntries=null, clientName=, clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, txid=1585682886] java.io.IOException: File is not under construction: /appdata/148973_perfengp/TARGET/092016/tempdb.TARGET.092016.hdfs at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:436) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:679) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:536) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:595) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504) ROOT CAUSE: Edit log corruption can happen if append fails with a quota violation. This is BUG https://issues.apache.org/jira/browse/HDFS-7587 https://hortonworks.jira.com/browse/BUG-56811 https://hortonworks.jira.com/browse/EAR-1248 RESOLUTION: 1. Stop everything 2. Backup the "current" folder of every journalnodes of the cluster 3. Backup the "current" folder of every namenodes of the cluster 4. Use the oev command to convert the binary editlog file into xml 5. Remove the record corresponding to the TXID mentioned in the error 6. Use the oev command to convert the xml editlog file into binary 7. Restart the active namenode 8. I got an error saying there was a gap in the editlogs 9. Take the keytab for the service nn/<host>@<REALM> 10. Execute the command hadoop namenode -recover 11. Answer "c" when the problem of gap occured 12. Then I saw other errors similar to the one I encountered at the beginning (the file not under construction issue) 13. I had to run the command hadoop namenode recover twice in order to get rid of these errors 14. Zookeeper servers were already started, so I started the journalnodes, the datanodes, the zkfc controllers and finally the active namenode 15. Some datanodes were identified as dead. After some investigations, I figured it was the information in zookeeper which were empty, so I restarted zookeeper servers and after the active namenode was there. 15. I started the standby namenode but it raised the same errors concerning the gap in the editlogs. 16. Being the user hdfs, I executed on the standby namenode the command hadoop namenode -bootstrapStandby -force 17. The new FSimage was good and identical to the one on the active namenode 18. I started the standby namenode successfully 19. I launched the rest of the cluster Also check recovery option given in link - Namenode-Recovery

sshimpi · ‎11-16-2016

Please try clearing the cache and retry. Check if you see any error in /var/log/ambari-server/ambari-server.logs

sshimpi · ‎11-16-2016

@yankai wang What command you use for password reset ? was it "ambari-admin-password-reset" ?

sshimpi · ‎11-16-2016

Please check if postgresql is exceeded number of pool connections. Increasing the value from 100 to 500 for pool connection resolved the issue. Login to mysql using superuser and check pool connections as below - mysql> SHOW VARIABLES LIKE "max_connections"; Modify pool connections - mysql> SET GLOBAL max_connections =500;

sshimpi · ‎11-15-2016

@Kate Shaw It normally takes 30secs to refresh policies. You can check the "Plugin" option in ranger UI to check if the policy is getting sync or not.There is no option to force policies. I see there is option in each service plugin in Ambari to define time interval. Example for HDFS service is shown below In Ambari UI->HDFS->Services->Configs->"Advance ranger-hdfs-security" you can change the poll interval here[refresh time]. Check few link which can help to understand better - https://community.hortonworks.com/questions/13070/ranger-policy-is-not-applied.html

sshimpi · ‎11-15-2016

So it will look like - Location: /root/devph/labs/Lab3.3

sshimpi · ‎11-15-2016

@Anindya Chattopadhyay In the above image just put the value infront of "Location" as /root/devph/labs/Lab3.3 and enter. This will take you to the path.

sshimpi · ‎11-15-2016

@Zeeshan Ahmed You can install HDP 2.3.2 which comes with Apache Spark 1.4.1 Here are the release notes - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_HDP_RelNotes/content/ch_relnotes_v232.html Ambari version - Ambari-2.1.2.1 Here is ambari repo - https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Installing_HDP_AMB/content/_download_the_ambari_repo_lnx6.html wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.1.2.1/ambari.repo -O /etc/yum.repos.d/ambari.repo $cat ambari.repo #VERSION_NUMBER=2.1.2.1-418 [Updates-ambari-2.1.2.1] name=ambari-2.1.2.1 - Updates baseurl=http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.1.2.1 gpgcheck=1 gpgkey=http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins enabled=1 priority=1

sshimpi · ‎11-15-2016

ISSUE: While performing unkerberizing cluster all services were down and nothing was coming up. Also the unkeberized cluster step failed. The start of services was failed. Tried to manually start Namenodes which came up but the status was not displayed correctly in Ambari UI. The journal node were not able to start and was failing with error as shown below. ERROR: Screenshot is attached below Journal node error: ROOT CAUSE: There were multiple issue as below - 1. From the JN error it says "missing spnego keytab". From the error It seems the kerberos was not properly disabled on cluster. 2. As checked in hdfs-site.xml the property "hadoop.http.authentication.type" was set to kerberos. 3. Oozie was not able to detect active namenode, since the property "hadoop.http.authentication.simple.anonymous.allowed" was set to false. RESOLUTION: 1. Setting hadoop.http.authentication.type to simple in hdfs-site.xml, HDFS was able to restart 2. Setting the property hadoop.http.authentication.simple.anonymous.allowed=true in hdfs-site.xml oozie was able to detect active namenode and also namenode status was corrrectly displayed in namenode UI.

Online	Offline
Last Visited	‎12-07-2017 08:26 AM

Member Since	‎02-08-2016 09:06 AM
Last Visited	‎12-07-2017 08:26 AM
Posts	793
Kudos received	667

Cloudera Community

Re: Issue with Ranger User/group sync

Re: Ranger HDFS test connection fails

Re: Error while configuring NameNode High Availabi...

Re: Ranger policies on HDFS

Re: Can we do column value level restriction in Ap...

HIVE View error - H100 Unable to submit statement ...

Standby namenode crashing due to edit log corrupti...

Re: Can't log into ambari web UI using username an...

Re: Can't log into ambari web UI using username an...

Re: Hive View Suddenly Failing with H100 Unable to...

Re: How long does it take for ranger policies to r...

Re: DEVPH Folder in Self paced learning VM

Re: DEVPH Folder in Self paced learning VM

Re: Spark 1.41. installation

Unkerberized cluster broke everything