Member since
01-07-2016
33
Posts
15
Kudos Received
3
Solutions
12-31-2016
11:00 PM
1 Kudo
Issue: In a heavily utilized kafka cluster AMS will keep on crashing with error "ERROR org.apache.hadoop.hbase.client.AsyncProcess: Cannot get replica 0 location for {"totalColumns":5,"row":"kafka.server.FetcherLagMetrics." Solution: 1. Run the following command to gather the amount of metrics being collected: curl http://<Ambari-metrics-collector-host>:6188/ws/v1/timeline/metrics/metadata 2. From Ambari UI -> Kafka -> Configs -> Fliter search for: "external.kafka.metrics.exclude.prefix" 3. Add the following at the end: kafka.log.Log 4. Restart Kafka. This will exclude additional metrics from getting captured and will increase the stability of the AMS.
... View more
Labels:
12-31-2016
09:21 PM
Error: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry '115' for key 'PRIMARY' ROOT CAUSE:
corrupted ambari db-- seeing following errors in Hive View
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Duplicate entry '115' for key 'PRIMARY'
SOLUTION:
we truncated the following tables from amabri db DS_FILERESOURCEITEM_1 |
| DS_JOBIMPL_3 |
| DS_SAVEDQUERY_6 |
| DS_STOREDOPERATIONHANDLE_5 |
| DS_TESTBEAN_4 |
| DS_UDF_2 Restart ambari server and we are able to access views and run query
... View more
Labels:
12-31-2016
08:21 PM
1 Kudo
Issue: Solr service check fails from ambari when Namenode HA is enabled with error: Unable to create core [collection1_shard2_replica1] Caused by: Connection refused Solution 1: add this line "SOLR_HDFS_CONFIG=/etc/hadoop/conf" to solr-config-env content at the end of file Restart Solr and now the service checks should pass. Solution 2: Another workaround is to edit the following file on your Ambari Server: /var/lib/ambari-server/resources/mpacks/solr-ambari-mpack-5.5.2.2.5/common-services/SOLR/5.5.2.2.5/package/scripts/solr.py, and then make this change: https://github.com/lucidworks/solr-stack/commit/7b79894b37b862b86d80c64b34230bc9fed6e54a. Then restart the Ambari Server. With this change in place the Solr instances will be started and will work fine with NN HA. This will be fixed in HDP 2.6
... View more
Labels:
12-31-2016
08:14 PM
1 Kudo
Issue: If there is a custom PID location configured for services and a non standard service account other than user "ranger", before finalizing upgrade, Ambari shows Ranger service as stopped. Solution: Confirm that the Ranger process is running: ps -ef | grep ranger The pid and the chown values are hard coded in the /usr/bin/ranger*. From this location, start and stop are called. The pidf and the chown values are hard coded in the /usr/bin/ranger-admin from where start and stop are called if you invoke the same from ambari UI. after changing those values to custom parameters ambari reported the service as running from UI. Previous value: cd /usr/bin cat ranger-admin | grep -i pid pidf=/var/run/ranger/rangeradmin.pid New value: vi ranger-admin pidf=<custom location> Save and quit After changing these values restart ranger service from ambari
... View more
Labels:
12-31-2016
07:34 PM
2 Kudos
Error: ERROR [2016-12-13 00:48:04,166] ({pool-2-thread-2} Job.java[run]:189) - Job failed
java.lang.NoClassDefFoundError: org/apache/hadoop/security/UserGroupInformation$AuthenticationMethod
at org.apache.zeppelin.jdbc.security.JDBCSecurityImpl.getAuthtype(JDBCSecurityImpl.java:66)
....
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.security.UserGroupInformation$AuthenticationMethod
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 15 more
Solution: For systems in hdp cluster with no internet access local-repo folder doesnt get created when running zeppelin interpreters which hold the jar files needed for the interpreters. To circumvent this we would need to copy the following jars to a temporary location and then specify these in the interpreter setting from zeppelin UI create and copy all required jars to following "/usr/hdp/current/zeppelin-server/jarss/"
/usr/hdp/current/zeppelin-server/jarss/hive-jdbc-2.0.1-standalone.jar
/usr/hdp/current/zeppelin-server/jarss/hadoop-common-2.7.3.2.5.0.0-1245.jar
/usr/hdp/current/zeppelin-server/jarss/hive-shims-0.23-2.1.0.2.5.0.0-1245.jar
/usr/hdp/current/zeppelin-server/jarss/commons-configuration-1.10.jar
/usr/hdp/current/zeppelin-server/jarss/hadoop-auth-2.7.3.2.5.0.0-1245.jar
/usr/hdp/current/zeppelin-server/jarss/curator-client-2.7.1.jar
/usr/hdp/current/zeppelin-server/jarss/curator-framework-2.7.1.jar
/usr/hdp/current/zeppelin-server/jarss/zookeeper-3.4.6.2.5.0.0-1245.jar
/usr/hdp/current/zeppelin-server/jarss/commons-lang3-3.3.2.jar Specify the complete path to the jar under Zeppelin UI --> Interpreter --> Jdbc
... View more
Labels:
12-31-2016
07:17 PM
Change the livy.spark.master to yarn cluster and add the following environment variable in the zeppelin-env from ambari export PYSPARK_DRIVER_PYTHON=path_to_python2.7
export PYSPARK_PYTHON=path_to_python2.7
After this restart livy spark interpreter started to work.
... View more
Labels:
12-31-2016
07:05 PM
1 Kudo
Error: %jdbc (Hive)
java.util.ServiceConfigurationError: javax.xml.parsers.DocumentBuilderFactory: Provider org.apache.xerces.jaxp.DocumentBuilderFactoryImpl not found
at java.util.ServiceLoader.fail(ServiceLoader.java:239)
at java.util.ServiceLoader.access$300(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:372)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at javax.xml.parsers.FactoryFinder$1.run(FactoryFinder.java:294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.xml.parsers.FactoryFinder.findServiceProvider(FactoryFinder.java:289)
at javax.xml.parsers.FactoryFinder.find(FactoryFinder.java:267)
at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:120)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2549)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2526)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2418)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1143)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1115) SOLUTION: Copy the following jars :
xercesImpl*.jar and xml-apis* For eg: Can be found by running locate xercesImpl* on any of the node in he cluster copy /usr/hdp/<hadoop-version>/hadoop/client/xercesImpl-2.9.1.jar to:
/usr/hdp/current/zeppelin-server/interpreter/jdbc/ Restart the interpreter
... View more
Labels:
12-31-2016
06:46 PM
1 Kudo
Error: at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:302)
at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:120)
at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:458)
at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:693)
... 43 more
Caused by: KrbException: Identifier doesn't match expected value (906)
at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
at sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
... 49 more
16/11/29 13:13:12 WARN security.UserGroupInformation: Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
Cause: The cause for this issue was that there were multiple accounts in the Active Directory that had a servicePrincipalName value containing the Zookeeper principal names - "zookeeper/<hostname>". This was found by issuing an ldapsearch like: ldapsearch -h <host> -D <user principal> -W -b "<bind dn - something high in the tree>" '(servicePrincipalName=zookeeper/<zk server hostname>)' dn
This request found 2 accounts that contained the requested SPN. One way to know this may be an issue is after authenticating (kinit-ing) as any valid user, issue a kvno command like kvno zookeeper/abc.ambari.apache.org If this fails but a different service principal (like nn/abc.ambari.apache.org) succeeds, then the above cause may be the problem. Solution: Find all duplicated SPN values and remove the non-Ambari-managed ones from the Active Directory. Then restart all of the services. Optionally all of the Keytab files can be regenerated to make sure all is in a good state.
... View more