About sshimpi

sshimpi · ‎12-23-2016

SYMPTOM: User was not able to browse ambari UI after ambari server restart. Ambari version : 2.1.2 Below was the error seen in logs ERROR: 06 Jul 2016 09:40:26,505 ERROR [Stack Version Loading Thread] LatestRepoCallable:93 - Could not load the URI for stack HDP-2.1 from http://public-repo-1.hortonworks.com/HDP/hdp_urlinfo.json (connect timed out) 06 Jul 2016 09:40:26,506 INFO [Stack Version Loading Thread] LatestRepoCallable:74 - Loading latest URL info for stack HDP-2.2 from http://public-repo-1.hortonworks.com/HDP/hdp_urlinfo.json 06 Jul 2016 09:40:28,508 ERROR [Stack VersionLoading Thread] LatestRepoCallable:93 - Could not load the URI for stack HDP-2.2 from http://public-repo-1.hortonworks.com/HDP/hdp_urlinfo.json (connect timed out) 06 Jul 2016 09:40:28,509 INFO [Stack Version Loading Thread] LatestRepoCallable:74 - Loading latest URL info for stack HDP-2.3 from http://public-repo-1.hortonworks.com/HDP/hdp_urlinfo.json 06 Jul 2016 09:40:30,511 ERROR [Stack Version Loading Thread] LatestRepoCallable:93 - Could not load the URI for stack HDP-2.3 from http://public-repo-1.hortonworks.com/HDP/hdp_urlinfo.json (connect timed out) 06 Jul 2016 09:40:30,511 INFO [Stack Version Loading Thread] LatestRepoCallable:74 - Loading latest URL info for stack HDP-2.0 from http://public-repo-1.hortonworks.com/HDP/hdp_urlinfo.json 06 Jul 2016 09:40:32,514 ERROR [Stack Version Loading Thread] LatestRepoCallable:93 - Could not load the URI for stack HDP-2.0 from http://public-repo-1.hortonworks.com/HDP/hdp_urlinfo.json (connect timed out) 06 Jul 2016 09:40:32,514 INFO [Stack VersionL oading Thread] LatestRepoCallable:74 - Loading latest URL info for stack HDP-2.3.GlusterFS from http://s3.amazonaws.com/dev.hortonworks.com/HDP/hdp_urlinfo.json 06 Jul 2016 09:40:34,519 ERROR [Stack Version Loading Thread] LatestRepoCallable:93 - Could not load the URI for stack HDP-2.3.GlusterFS from http://s3.amazonaws.com/dev.hortonworks.com/HDP/hdp_urlinfo.json ROOT CAUSE: This is a BUG in Ambari 2.1.2 version and below are the jira - https://hortonworks.jira.com/browse/BUG-46081 RESOLUTION: Upgrading Ambari from 2.1.2 to 2.1.2.1 resolved the issue.

sshimpi · ‎12-22-2016

SYMPTOM: RM is down due to below error. Earlier we were suspicion the ulimit could be culprit though we have increased it to 128K. But still no luck. ERROR: 2016-07-25 12:19:47,125 WARN security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(873)) - Unable to add the application to the delegation token renewer. java.lang.OutOfMemoryError: unable to create new native thread. Below was few steps followed - 1. Checked the error and saw that previously the same issue and increasing ulimit resolved the issue. 2. Checked the ulimit and lsof output - $ulimit -n 131072 $lsof |grep yarn |wc 1726 15553 242741 3. Checked the heap size for yarn process which was set to 8Gb and looks good. Below error was displayed in RM out.log file Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f89641cf000, 12288, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 12288 bytes for commtting reserved memory. # An error report file with more information is saved as: # /tmp/hs_err_pid56149.log Java HotSpot(TM) 64-Bit Server VM warning: Attempt to deallocate stack guard pages failed. Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00007f89642d0000, 12288, 0) failed; error='Cannot allocate memory' (errno=12) Below was log in "/tmp/hs_err_pid56149.log" this looks a problem with memory allocation for threads at OS level === Stack: [0x00007f89641cf000,0x00007f89642d0000], sp=0x00007f89642ce900, free space=1022k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x99eb8a] VMError::report_and_die()+0x2ea V [libjvm.so+0x49721b] report_vm_out_of_memory(char const*, int, unsigned long, char const*)+0x9b V [libjvm.so+0x81d9ae] os::Linux::commit_memory_impl(char*, unsigned long, bool)+0xfe V [libjvm.so+0x81da6c] os::pd_commit_memory(char*, unsigned long, bool)+0xc V [libjvm.so+0x8157fa] os::commit_memory(char*, unsigned long, bool)+0x2a V [libjvm.so+0x81bf5d] os::pd_create_stack_guard_pages(char*, unsigned long)+0x6d V [libjvm.so+0x95249e] JavaThread::create_stack_guard_pages()+0x5e V [libjvm.so+0x958de4] JavaThread::run()+0x34 V [libjvm.so+0x81f988] java_start(Thread*)+0x108 === stack suggest memory allocation (malloc) failed at OS level.check you have enough physical memory available at host. ROOT CAUSE: Collected the jstack logs for process and found that - the 'Truststore reloader thread' count is increasing which is the same issue what i earlier mentioned - https://issues.apache.org/jira/browse/YARN-5309. $grep 'Truststore reloader thread' threadDump|wc -l 14873 $ grep 'Truststore reloader thread' threadDump1|wc -l 14999 $grep 'Truststore reloader thread' threadDump2|wc -l 15063 $grep 'Truststore reloader thread' threadDump3|wc -l 15149 $grep 'Truststore reloader thread' threadDump4|wc -l 15230 $grep 'Truststore reloader thread' threadDump5|wc -l 15347 RESOLUTION: This is confirmed as BUG and patch has been provided to resolve the issue https://issues.apache.org/jira/browse/YARN-5309 https://hortonworks.jira.com/browse/BUG-63499

sshimpi · ‎12-22-2016

SYMPTOM: User has latest HDP integrated with kerberos. While starting the datanode user gets the message: Login failure for dn/host1@EXAMPLE.NET from keytab /etc/security/keytabs/dn.service.keytab. But the principal is dn/host1.bc@EXAMPLE.NET Where host1 is the hostname of the datanode host and EXAMPLE.NET is the REALM name. ERROR: The output of klist command is as below - $klist -kt /etc/security/keytabs/dn.service.keytab Keytab name: FILE:/etc/security/keytabs/dn.service.keytab KVNO Timestamp Principal ---- ------------------- ------------------------------------------------------ 0 12/21/2016 10:38:13 dn/host1.bc@EXAMPLE.NET In logs it shows - dn/host1@EXAMPLE.NET Where as it should show - dn/host1.bc@EXAMPLE.NET ROOT CAUSE: This is issue with entries in /etc/host file. RESOLUTION: User has below entry in /etc/hosts file - <ipaddress> <hostname> <FQDN> <FQDN> Now the order is changed to <ipaddress> <FQDN> <hostname> <FQDN> Which resolved the issue.

sshimpi · ‎12-22-2016

Done. Thanks

sshimpi · ‎12-22-2016

Create the self signed certificate and add it to a keystore file using: keytool -genkey -alias example.com -keyalg RSA -keystore keystore.jks -keysize 2048 2. List the keystore entries to verify that the certificate was added. Note that a keystore can contain multiple such certificates: keytool -list -keystore keystore.jks 3. Export this certificate from keystore.jks to a certificate file: keytool -export -alias example.com -file example.com.crt -keystore keystore.jks 4. Add this certificate to the client's truststore to establish trust: keytool -import -trustcacerts -alias example.com -file example.com.crt -keystore truststore.jks 5. Verify that the certificate exists in truststore.jks: keytool -list -keystore truststore.jks 6. Set hive.server2.thrift.sasl.qop=auth in HS2 configs Then start HiveServer2, login with user->kinit->beeline and try to connect with beeline using: !connect jdbc:hive2://<hs2_hostname>:10001/default;principal=<hive_principal>;transportMode=http;httpPath=cliservice;ssl=true;sslTrustStore=<truststore_file_path>;trustStorePassword=<truststore_password>

sshimpi · ‎12-22-2016

SYMPTOM: Created a user in Ranger. User is visible in ranger DB but not visible in Ranger UI ERROR: Logged into mysql DB and executed below command - SELECT * FROM ranger.x_user where user_name in ('userA'); ==> This shows the user exist in x_user table. SELECT * FROM ranger.x_portal_user where user_name in ('userA'); ==> The user is also present in x_portal_user ROOT CAUSE: Suspected corruption on Ranger DB sometimes. RESOLUTION: Executing below command resolved the issue >INSERT INTO x_portal_user_role VALUES(NULL,'2016-09-09 00:00:00','2016-09-09 00:00:00',1,1,(SELECT id FROM x_portal_user WHERE login_id='XXXX'),'ROLE_USER',1); Replace XXXX with the login_id of the user ('userA') You can replace 'ROLE_USER' with 'ROLE_SYS_ADMIN' if you want it to be an admin

sshimpi · ‎12-21-2016

SYMPTOM: HDP upgrade was failed on HDFS startup. Namenode was not able to start and below were log messages - ERROR: From the detailed logs we see below error - ROOT CAUSE: The above log clearly indicates"ClassNotFound" error. Customer has integrated customer jar in hadoop which was causing the issue. RESOLUTION: There was custom jar which was already in place with Previous HDP version [located in path - /usr/hdp/2.4.3.0-37/hadoop/sas*.jar ]. Adding the jar from earlier version to the upgraded version path [ie. /usr/hdp/2.5.3.0-37/hadoop/] resolved the issue. [Note: There was custom implementation of SAS with hadoop for the setup and hence the custom jars were present in path mentioned above ie. /usr/hdp/2.4.3.0-37/hadoop/sas*.jar. Default setup never includes any custom app/jar implementation with hadoop. It usually refers or org.apache.hadoop class.]

jaimin · ‎12-22-2016

@Jay SenSharma Although above mentioned steps can be more simplified (no need to create other user), they should do the needed work to get user unblock from this issue

sshimpi · ‎12-20-2016

SYMPTOM: Access Audit logs show 6 hours behind from Central Timezone. This is related to https://issues.apache.org/jira/browse/RANGER-336 ERROR: Ranger Access Audit logs show 6 hours behind from Central Timezone. ROOT CAUSE: This is bug - https://issues.apache.org/jira/browse/RANGER-336 RESOLUTION: Create a file name "ranger-admin-env-javaopts.sh" with below entry in path "/usr/hdp/current/ranger-admin/conf/" - export JAVA_OPTS=" ${JAVA_OPTS} -Duser.timezone=UTC" Save the file and restart Ranger admin service.

sshimpi · ‎12-20-2016

SYMPTOM: Ambari agent not able to register with Ambari server. ERROR: Below is the error logs - ERROR 2016-12-19 10:17:54,387 Controller.py:194 - Unable to connect to: https://oser402529.host.com:8441/agent/v1/register/oser402566.host.com Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 137, in registerWithServer data = json.dumps(self.register.build(self.version)) File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 230, in dumps return _default_encoder.encode(obj) File "/usr/lib/python2.6/site-packages/ambari_simplejson/encoder.py", line 200, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python2.6/site-packages/ambari_simplejson/encoder.py", line 260, in iterencode return _iterencode(o, 0) UnicodeDecodeError: 'utf8' codec can't decode byte 0xac in position 928: invalid start byte ERROR 2016-12-19 10:17:54,388 Controller.py:195 - Error:'utf8' codec can't decode byte 0xac in position 928: invalid start byte WARNING 2016-12-19 10:17:54,388 Controller.py:196 - Sleeping for 11 seconds and then trying again ERROR 2016-12-19 10:18:05,686 Controller.py:194 - Unable to connect to: https://oser402529.host.com:8441/agent/v1/register/oser402566.host.com Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/Controller.py", line 137, in registerWithServer data = json.dumps(self.register.build(self.version)) File "/usr/lib/python2.6/site-packages/ambari_simplejson/__init__.py", line 230, in dumps return _default_encoder.encode(obj) File "/usr/lib/python2.6/site-packages/ambari_simplejson/encoder.py", line 200, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python2.6/site-packages/ambari_simplejson/encoder.py", line 260, in iterencode return _iterencode(o, 0) UnicodeDecodeError: 'utf8' codec can't decode byte 0xac in position 928: invalid start byte ROOT CAUSE: This is bug - https://hortonworks.jira.com/browse/BUG-52919 RESOLUTION: Create /usr/lib/python2.6/site-packages/sitecustomize.py file with the below content, restart ambari-agent. import sys sys.setdefaultencoding('utf-8’) Restart ambari-agent.

Online	Offline
Last Visited	‎12-07-2017 08:26 AM

Member Since	‎02-08-2016 09:06 AM
Last Visited	‎12-07-2017 08:26 AM
Posts	793
Kudos received	667

Cloudera Community

Re: Issue with Ranger User/group sync

Re: Ranger HDFS test connection fails

Re: Error while configuring NameNode High Availabi...

Re: Ranger policies on HDFS

Re: Can we do column value level restriction in Ap...

Ambari Server GUI not able to browse

Resource Manager process is getting down multiple ...

Datanode principal Kerberos issue

Re: Knox ldap search fails because the size limit ...

How to configure HS2 SSL with kerberos

User not visible in Ranger UI

HDP upgrade got failed at HDFS

Re: Ambari UI shows "Add Service Wizard In Progres...

Ranger Audit Access logs issue with current timezo...

Ambari agent not able to register with Ambari serv...