Member since
02-08-2016
793
Posts
669
Kudos Received
85
Solutions
11-16-2016
11:33 AM
7 Kudos
ISSUE: Hive view is not working. ERROR: H100 Unable to submit statement show databases like '*': org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
ROOT CAUSE: Issue was with mysql pool connection size limit exceeded.Check using - mysql> SHOW VARIABLES LIKE "max_connections";
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 100 |
+-----------------+-------+
1 row in set (0.00 sec)
RESOLUTION: Modified mysql pool size limit from 100 to 500 and restarted mysql which resolved the issue. mysql> SET GLOBAL max_connections = 500;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE "max_connections";
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| max_connections | 500 |
+-----------------+-------+
1 row in set (0.00 sec)
... View more
Labels:
11-16-2016
11:33 AM
7 Kudos
SYMPTOM: Standby NN crashing due to edit log corruption and complaining that OP_CLOSE cannot be applied because the file is not under-construction
ERROR: 2016-09-30T06:23:25.126-0400 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/appdata/148973_perfengp/TARGET/092016/tempdb.TARGET.092016.hdfs, replication=3, mtime=1475223680193, atime=1472804384143, blockSize=134217728, blocks=[blk_1243879398_198862467], permissions=gsspe:148973_psdbpe:rwxrwxr-x, aclEntries=null, clientName=, clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, txid=1585682886]
java.io.IOException: File is not under construction: /appdata/148973_perfengp/TARGET/092016/tempdb.TARGET.092016.hdfs
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:436)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:679)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:281)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1022)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:741)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:536)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:595)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:762)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:746)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1438)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1504)
ROOT CAUSE: Edit log corruption can happen if append fails with a quota violation. This is BUG
https://issues.apache.org/jira/browse/HDFS-7587
https://hortonworks.jira.com/browse/BUG-56811
https://hortonworks.jira.com/browse/EAR-1248
RESOLUTION: 1. Stop everything
2. Backup the "current" folder of every journalnodes of the cluster
3. Backup the "current" folder of every namenodes of the cluster
4. Use the oev command to convert the binary editlog file into xml
5. Remove the record corresponding to the TXID mentioned in the error
6. Use the oev command to convert the xml editlog file into binary
7. Restart the active namenode
8. I got an error saying there was a gap in the editlogs
9. Take the keytab for the service nn/<host>@<REALM>
10. Execute the command hadoop namenode -recover
11. Answer "c" when the problem of gap occured
12. Then I saw other errors similar to the one I encountered at the beginning (the file not under construction issue)
13. I had to run the command hadoop namenode recover twice in order to get rid of these errors
14. Zookeeper servers were already started, so I started the journalnodes, the datanodes, the zkfc controllers and finally the active namenode
15. Some datanodes were identified as dead. After some investigations, I figured it was the information in zookeeper which were empty, so I restarted zookeeper servers and after the active namenode was there.
15. I started the standby namenode but it raised the same errors concerning the gap in the editlogs.
16. Being the user hdfs, I executed on the standby namenode the command hadoop namenode -bootstrapStandby -force
17. The new FSimage was good and identical to the one on the active namenode
18. I started the standby namenode successfully
19. I launched the rest of the cluster
Also check recovery option given in link - Namenode-Recovery
... View more
Labels:
11-15-2016
02:09 PM
6 Kudos
ISSUE: While performing unkerberizing cluster all services were down and nothing was coming up. Also the unkeberized cluster step failed. The start of services was failed. Tried to manually start Namenodes which came up but the status was not displayed correctly in Ambari UI. The journal node were not able to start and was failing with error as shown below. ERROR: Screenshot is attached below Journal node error: ROOT CAUSE: There were multiple issue as below - 1. From the JN error it says "missing spnego keytab". From the error It seems the kerberos was not properly disabled on cluster. 2. As checked in hdfs-site.xml the property "hadoop.http.authentication.type" was set to kerberos. 3. Oozie was not able to detect active namenode, since the property "hadoop.http.authentication.simple.anonymous.allowed" was set to false. RESOLUTION: 1. Setting hadoop.http.authentication.type to simple in hdfs-site.xml, HDFS was able to restart 2. Setting the property hadoop.http.authentication.simple.anonymous.allowed=true in hdfs-site.xml oozie was able to detect active namenode and also namenode status was corrrectly displayed in namenode UI.
... View more
Labels:
11-15-2016
02:09 PM
7 Kudos
updateddeleteuser.zipISSUE: Ranger ldap integration was working fine. Customer delete user from ranger UI and was facing issue while re-importing user in Ranger.
ROOT CAUSE: Customer removed the users from ranger UI and expected that the user should be automatically imported from ranger usersync process Below are sample screenshot - User named 'testuser' is deleted from Ranger UI. But below you can see the user is still available in database.
RESOLUTION: There are multiple tables which has the entry for the user. You need to run delete script to delete the user entries from database and re-start ranger usersync process to re-import the user. Please find attach delete script- Syntax to run the script - $ deleteUser.sh -f input.txt -u ranger_user -p password -db ranger [-r <replaceUser>]
... View more
Labels:
11-15-2016
05:29 AM
6 Kudos
ISSUE: After enabling Ambari SSL Hive views stopped working. ERROR: 08 Nov 2016 11:32:23,330 WARN [qtp-ambari-client-263] nio:720 - javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
08 Nov 2016 11:32:23,331 ERROR [qtp-ambari-client-256] ServiceFormattedException:100 - org.apache.ambari.view.utils.ambari.AmbariApiException: RA040 I/O error while requesting Ambari
org.apache.ambari.view.utils.ambari.AmbariApiException: RA040 I/O error while requesting Ambari
at org.apache.ambari.view.utils.ambari.AmbariApi.requestClusterAPI(AmbariApi.java:176)
at org.apache.ambari.view.utils.ambari.AmbariApi.requestClusterAPI(AmbariApi.java:142)
at org.apache.ambari.view.utils.ambari.AmbariApi.getHostsWithComponent(AmbariApi.java:99)
at org.apache.ambari.view.hive.client.ConnectionFactory.getHiveHost(ConnectionFactory.java:79)
at org.apache.ambari.view.hive.client.ConnectionFactory.create(ConnectionFactory.java:68)
at org.apache.ambari.view.hive.client.UserLocalConnection.initialValue(UserLocalConnection.java:42)
at org.apache.ambari.view.hive.client.UserLocalConnection.initialValue(UserLocalConnection.java:26)
at org.apache.ambari.view.utils.UserLocal.get(UserLocal.java:66)
at org.apache.ambari.view.hive.resources.browser.HiveBrowserService.databases(HiveBrowserService.java:87)
at sun.reflect.GeneratedMethodAccessor186.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1509)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216)
Root Cause: Truststore configuration for Ambari Server was missing. Resolution: Setup the trustore for ambari server as per link below after which above issue was resolved. https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Ambari_Security_Guide/content/_set_up_truststore_for_ambari_server.html
... View more
Labels:
11-14-2016
05:30 PM
6 Kudos
SYMPTOM: During HDP upgrade from 2.3 to 2.5 YARN check is failing due to NoSuchMethodError org.apache.hadoop.yarn.api.records.Resource.getMemorySize()J ERROR: Below was the error in application logs - 16/11/14 10:30:12 FATAL distributedshell.ApplicationMaster: Error running ApplicationMaster
java.lang.NoSuchMethodError: org.apache.hadoop.yarn.api.records.Resource.getMemorySize()J
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:585)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:298)
ROOT CAUSE: There was issue with classpath where the nodemanager on which the job was running was pointing to older version[ie. 2.3] classpath. RESOLUTION: There are two solutions as below - 1. Skip this step in Ambari upgrade UI and proceed. Ambari will take care of setting up the classpath. 2. Modify the classpath manually and confirm the set classpath using "hadoop classpath" command and re-run the service check.
... View more
04-20-2018
10:36 PM
Dear @Sagar Shimpi The problem I encountered was : The following 6 host component(s) have not been upgraded to version 1.1.5.0-235. Please install and upgrade the Stack Version on those hosts and try again.
Host components:
GLOBALMASTER on host e19e07452.et15sqa
LDSERVER on host e19e07452.et15sqa
LOCALMASTER on host e19e07452.et15sqa
LDSERVER on host e19e07466.et15sqa
LDSERVER on host e19e10465.et15sqa
LOCALMASTER on host e19e10465.et15sqa the "GLOBALMASTER" is my service component. Can you please help? Many thanks in advance.
... View more
09-27-2017
02:16 AM
Easy way to detect duplicate value is: select component_name, service_name, host_id, cluster_id,count(*) from ambari.hostcomponentdesiredstate group by component_name, service_name, host_id, cluster_id order by count desc; select component_name, service_name, host_id, cluster_id,count(*) from ambari.hostcomponentstate group by component_name, service_name, host_id, cluster_id order by count desc; You will find that count of one of the table is different from other. Just delete that by id and you are good to go.
... View more
11-08-2016
07:08 PM
2 Kudos
1. Lets assume you have HDP cluster installed and managed by Ambari. 2. When we want to delete a service [either Custom service or HDP service] using api, you generally use below command - curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<ambari-server>:8080/api/v1/clusters/c1/services/<SERVICENAME>; 2. After executing above command you might see below error - $curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<ambari-server>:8080/api/v1/clusters/c1/services/HBASE
{ "status" : 500, "message" : "org.apache.ambari.server.controller.spi.SystemException: An internal system exception occurred: Cannot remove HBASE. Desired state STARTED is not removable. Service must be stopped or disabled." 3. If you see the above error while Removing/Stopping service please use below steps to resolve the issue 4. Login to ambari database [In my case its postgresql] and check the values of the service in below tables - # psql -U ambari
[Default password is 'bigdata']
ambari=> select * from servicedesiredstate where service_name='HBASE';
ambari=> select * from servicecomponentdesiredstate where service_name='HBASE';
5. Make sure here in above output the value of column 'desired_state' should be INSTALLED 6. If you see the above value of "desired_state" is set to STARTED then update the column and set it to STARTED using below command - ambari=> update servicedesiredstate set desired_state='INSTALLED' where service_name='HBASE';
7. Follow same steps for "servicecomponentdesiredstate" table - ambari=> update servicecomponentdesiredstate set desired_state='INSTALLED' where service_name='HBASE'; 8. Now try removing/deleting the service now. It should work. $curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://<ambari-server>:8080/api/v1/clusters/c1/services/HBASE
... View more
Labels:
09-06-2016
05:17 PM
Also Make sure that curl is installed in target machine make sure that target machine can see oozie server and able to run the curl command against it In my case I've got the timeout issue and installed curl but didn't work so I added the FQDN in /etc/hosts and it worked perfectly well.
... View more
- « Previous
- Next »