About skurup

skurup · ‎01-01-2017

SYMPTOM Namenode crash might be observed when jni based unix group mappings is enabled. The crash will usually generate the "hs_err" log file which will have the stack as below: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fbc814dd2a0, pid=380582, tid=140448021370624 # # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops) # Problematic frame: # C [libnss_uxauth.so.2+0x4e2a0] sqlite3ExprCodeTarget+0xcc3 # ...... Stack: [0x00007fbc9a5c5000,0x00007fbc9a6c6000], sp=0x00007fbc9a6c2860, free space=1014k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libnss_uxauth.so.2+0x4e2a0] sqlite3ExprCodeTarget+0xcc3 C [libnss_uxauth.so.2+0x4e8db] evalConstExpr+0xf7 C [libnss_uxauth.so.2+0x47ae2] sqlite3WalkExpr+0x34 C [libnss_uxauth.so.2+0x47bdd] sqlite3WalkExprList+0x42 C [libnss_uxauth.so.2+0x47b80] sqlite3WalkExpr+0xd2 C [libnss_uxauth.so.2+0x47b15] sqlite3WalkExpr+0x67 C [libnss_uxauth.so.2+0x4e980] sqlite3ExprCodeConstants+0x5a C [libnss_uxauth.so.2+0x7cac1] sqlite3WhereBegin+0x1c5 C [libnss_uxauth.so.2+0x6ecc6] sqlite3Select+0x858 C [libnss_uxauth.so.2+0x7ea58] yy_reduce+0x86f C [libnss_uxauth.so.2+0x80f7c] sqlite3Parser+0xc8 C [libnss_uxauth.so.2+0x81d0d] sqlite3RunParser+0x28b C [libnss_uxauth.so.2+0x677d2] sqlite3Prepare+0x206 C [libnss_uxauth.so.2+0x67ab1] sqlite3LockAndPrepare+0x84 C [libnss_uxauth.so.2+0x67c53] sqlite3_prepare_v2+0x4d C [libnss_uxauth.so.2+0xad31] init_usergroups+0x182 C [libnss_uxauth.so.2+0x914c] uxauth_initgroups+0x69 C [libnss_uxauth.so.2+0xc9a0] _nss_uxauth_initgroups_dyn+0x88 C [libc.so.6+0xa979f] __tls_get_addr@@GLIBC_2.3+0xa979f Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j org.apache.hadoop.security.JniBasedUnixGroupsMapping.getGroupsForUser(Ljava/lang/String;)[Ljava/lang/String;+0 j org.apache.hadoop.security.JniBasedUnixGroupsMapping.getGroups(Ljava/lang/String;)Ljava/util/List;+6 j org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(Ljava/lang/String;)Ljava/util/List;+5 j org.apache.hadoop.security.Groups$GroupCacheLoader.fetchGroupList(Ljava/lang/String;)Ljava/util/List;+19 j org.apache.hadoop.security.Groups$GroupCacheLoader.load(Ljava/lang/String;)Ljava/util/List;+2 j org.apache.hadoop.security.Groups$GroupCacheLoader.load(Ljava/lang/Object;)Ljava/lang/Object;+5 j com.google.common.cache.CacheLoader.reload(Ljava/lang/Object;Ljava/lang/Object;)Lcom/google/common/util/concurrent/ListenableFuture;+2 ROOT CAUSE: We have couple of Apache JIRA's which are reported that track this issue . https://issues.apache.org/jira/browse/HADOOP-10442 https://issues.apache.org/jira/browse/HADOOP-10527 WORK AROUND: As a workaround we can change the JNI based mappings to shell based mapping by changing hadoop.security.group.mapping property through Ambari under "Advanced core-site" or in core-site.xml on namenode server. A HDFS restart would be required for this change to take effect. <property> <name>hadoop.security.group.mapping</name> <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value> </property>

skurup · ‎01-01-2017

SYMPTOM: While adding a new host via the Ambari , results in an exception as below: ##### 11 Apr 2016 10:52:24,629 ERROR [qtp-client-81] AbstractResourceProvider:279 - Caught AmbariException when creating a res ource org.apache.ambari.server.HostNotFoundException: Host not found, hostname=hostname_123.abc.xyz.com at org.apache.ambari.server.state.cluster.ClustersImpl.getHost(ClustersImpl.java:343) at org.apache.ambari.server.state.ConfigHelper.getEffectiveDesiredTags(ConfigHelper.java:108) at org.apache.ambari.server.controller.AmbariManagementControllerImpl.findConfigurationTagsWithOverrides(AmbariM anagementControllerImpl.java:1820) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:37) at com.sun.proxy.$Proxy82.findConfigurationTagsWithOverrides(Unknown Source) at org.apache.ambari.server.controller.AmbariActionExecutionHelper.addExecutionCommandsToStage(AmbariActionExecu tionHelper.java:372) at org.apache.ambari.server.controller.AmbariManagementControllerImpl.createAction(AmbariManagementControllerImp l.java:3366) at org.apache.ambari.server.controller.internal.RequestResourceProvider$1.invoke(RequestResourceProvider.java:16 5) at org.apache.ambari.server.controller.internal.RequestResourceProvider$1.invoke(RequestResourceProvider.java:16 2) at org.apache.ambari.server.controller.internal.AbstractResourceProvider.createResources(AbstractResourceProvide r.java:272) at org.apache.ambari.server.controller.internal.RequestResourceProvider.createResources(RequestResourceProvider. java:162) at org.apache.ambari.server.controller.internal.ClusterControllerImpl.createResources(ClusterControllerImpl.java :289) at org.apache.ambari.server.api.services.persistence.PersistenceManagerImpl.create(PersistenceManagerImpl.java:7 6) at org.apache.ambari.server.api.handlers.CreateHandler.persist(CreateHandler.java:36) at org.apache.ambari.server.api.handlers.BaseManagementHandler.handleRequest(BaseManagementHandler.java:72) at org.apache.ambari.server.api.services.BaseRequest.process(BaseRequest.java:135) at org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:105) at org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:74) at org.apache.ambari.server.api.services.RequestService.createRequests(RequestService.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) ########### ROOT CAUSE: The issue usually happens when there is a conflicting entry in the /etc/hosts for the node which we are trying to add. This could be due to the fact that the hostname entry in the /etc/hosts file could be "incorrectly" resolving out to an "incorrect" IP address or vise versa. The Ambari agent uses the script "/usr/lib/python2.6/site-packages/ambari_agent/hostname.py" to push the updates to the ambari DB / server. The script specifically calls out / updates the hostname based on the code below: ####### try: scriptname = config.get('agent', 'hostname_script') try: osStat = subprocess.Popen([scriptname], stdout=subprocess.PIPE, stderr=subprocess.PIPE) out, err = osStat.communicate() if (0 == osStat.returncode and 0 != len(out.strip())): cached_hostname = out.strip() else: cached_hostname = socket.getfqdn() except: cached_hostname = socket.getfqdn() except: cached_hostname = socket.getfqdn() cached_hostname = cached_hostname.lower() return cached_hostname ##### Here "socket.getfqdn()" will always look up for the /etc/hosts and update the "cached_hostname" which is then pushed out to the Ambari DB here. As a simple check we can do a following check from the host itself which we have trouble adding to determine if the "socket.getfqd()" results in the right hostname as done below: #### [root@sandbox ~]# python Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import socket >>> print socket.getfqdn() sandbox.hortonworks.com >>> ##### The output printed by the "print socket.getfqdn()" should match to the entry that is captured in the /etc/hosts file. SOLUTION: Update the /etc/hosts entry to fix the incorrect entry and add the host back through Ambari again which then installs the Ambari agent back and updates the right entry in the Ambari DB

skurup · ‎01-01-2017

SYMPTOMS: Cannot access Tez View. On accessing Tez View, we get the below error: error code: 500, message: Internal Server Error{"message":"Failed to fetch results by the proxy from url: http://<hostname>:8188/ws/v1/timeline/TEZ_DAG_ID?limit=11&_=1470838899327&...Error.","status":500,"trace":"Connection refused"} ROOT CAUSE: The issue is due to "YARN ResourceManager URL" pointing out to the standby RM node when you have a RM HA configured. This is usually a configuration change under "Cluster Configuration" while setting up a TEZ view where the user updates opts for the "Custom" option and includes only a single Resource Manager address in "YARN ResourceManager URL" configuration. SOLUTION: In case of RM HA configuration , the "YARN ResourceManager URL" should point out to a both the Resource Manager addresses which should be in a semi-colon separated format (e.g. sumeshnode1.openstacklocal:8088;sumeshnode3.openstacklocal:8088)

skurup · ‎01-01-2017

SYMPTOM: On invoking a mapreduce job or invoking a hive shell (incase where we use tez execution engine, which is default) which involves spinning up a mapreduce container on the mapreduce queue), we get the below error : [hive@sumeshhdpn2 root]$ hadoop jar /usr/hdp/2.2.4.2-2/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 10 Number of Maps = 1 Samples per Map = 10 Wrote input for Map #0 Starting Job 16/08/15 04:12:47 INFO impl.TimelineClientImpl: Timeline service address: http://sumeshhdpn2:8188/ws/v1/timeline/ 16/08/15 04:12:47 INFO client.RMProxy: Connecting to ResourceManager at sumeshhdpn2/172.25.16.48:8050 16/08/15 04:12:48 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 129 for hive on ha-hdfs:sumeshhdp 16/08/15 04:12:48 INFO security.TokenCache: Got dt for hdfs://sumeshhdp; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sumeshhdp, Ident: (HDFS_DELEGATION_TOKEN token 129 for hive) 16/08/15 04:12:53 INFO input.FileInputFormat: Total input paths to process : 1 16/08/15 04:12:56 INFO mapreduce.JobSubmitter: number of splits:1 16/08/15 04:12:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470233265007_0006 16/08/15 04:12:58 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sumeshhdp, Ident: (HDFS_DELEGATION_TOKEN token 129 for hive) 16/08/15 04:13:03 INFO impl.YarnClientImpl: Submitted application application_1470233265007_0006 16/08/15 04:13:04 INFO mapreduce.Job: The url to track the job: http://sumeshhdpn2:8088/proxy/application_1470233265007_0006/ 16/08/15 04:13:04 INFO mapreduce.Job: Running job: job_1470233265007_0006 16/08/15 04:13:04 INFO mapreduce.Job: Job job_1470233265007_0006 running in uber mode : false 16/08/15 04:13:04 INFO mapreduce.Job: map 0% reduce 0% 16/08/15 04:13:04 INFO mapreduce.Job: Job job_1470233265007_0006 failed with state FAILED due to: Application application_1470233265007_0006 submitted by user hive to non-leaf queue: default 16/08/15 04:13:04 INFO mapreduce.Job: Counters: 0 Job Finished in 17.775 seconds [hive@sumeshhdpn3 root]$ hive Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hive/lib/hive-jdbc-0.14.0.2.2.4.2-2-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1470233265007_0007 submitted by user hive to non-leaf queue: default at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:457) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1470233265007_0007 submitted by user hive to non-leaf queue: default at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:612) at org.apache.tez.client.TezClient.preWarm(TezClient.java:585) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:200) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:122) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:454) ... 8 more ROOT CAUSE: This is triggered when requestor / user create a queue under the "default" queue. The queue "default" should not have any child queue created under. An example of this would be as below: Here we have a queue called "default.test" queue created under "default" queue. Due to this when we submit any mapreduce job or hive shell, we get the exceptions as shown above. RESOLUTION / WORKAROUND: To address this issue, remove the queue under the default queue. If a queue needs to be created, we can create a queue under "root" and assign the required resources for it.

skurup · ‎12-30-2016

Can we re-update the phoenix client jar here in the Squirrel and then try it again ?

skurup · ‎12-30-2016

@Varun R I would suggest you to vet your settings / configs against whats documented here. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_security/content/ch_wire-webhdfs-mr-yarn.html I have not followed the link you mentioned, but just by cursory look , it looks fine to me . Also please do accept the answer if this has helped to resolve the issue

skurup · ‎12-30-2016

Since you have "phoenix.schema.mapSystemTablesToNamespace" set on the server end, your cliemt i.e squirrel client needs to have the same config updated. To do that , you will need to add the hbase conf dir in 'Extra Class Path' tab on Squirrel . Ensure that hbase-site.xml that located in "hbase conf dir" in the 'Extra Class Path' has the property updated as below: <property> <name>phoenix.schema.isNamespaceMappingEnabled</name> <value>true</value> </property>

skurup · ‎12-29-2016

The query fails here. Below is the stack for the error generated public static RecordUpdater getAcidRecordUpdater(JobConf jc, TableDesc tableInfo, int bucket, FileSinkDesc conf, Path outPath, ObjectInspector inspector, Reporter reporter, int rowIdColNum) throws HiveException, IOException { HiveOutputFormat<?, ?> hiveOutputFormat = getHiveOutputFormat(jc, tableInfo); AcidOutputFormat<?, ?> acidOutputFormat = null; if (hiveOutputFormat instanceof AcidOutputFormat) { acidOutputFormat = (AcidOutputFormat)hiveOutputFormat; } else { throw new HiveException("Unable to create RecordUpdater for HiveOutputFormat that does not " + "implement AcidOutputFormat"); Are these tables have ACID support enabled ?

skurup · ‎12-29-2016

@Varun R Looks like the Namenode is still in the same mode.The o/p for the command return "1" /usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin -fs hdfs://sandbox.hortonworks.com:8020 -safemode get | grep 'Safe mode is OFF' Please check the namenode logs to see whats the issue could be . You can also open the namenode UI and check the status of the namenode as below hdfs://sandbox.hortonworks.com:8020:50070/dfshealth.html#tab-overview If you think that the cluster looks ok, you can manually make the namenode come out of the "safe" mode as below and then try again : /usr/hdp/current/hadoop-hdfs-namenode/bin/hdfs dfsadmin hdfs://sandbox.hortonworks.com:8020 -safemode leave

skurup · ‎12-28-2016

Its kind of a corner case issue that large hbase clusters usually do encounter . This was also addressed to the Ambari team to get these values corrected and overridden to a fairly moderate value . Please refer to the link below where Ted Yu has mentioned these values to be updated: https://issues.apache.org/jira/browse/AMBARI-16278 Along with what was suggested by @gsharma , please also update the hbase.regionserver.executor.openregion.threads to "20" and then test it again. This will ensure that the number of concurrent threads to process opening of regions is more and will help in faster initialization of the regions. Again , the values in Ambari are set as per the right threshold and in most of the cases , this will work fine. Only in corner case , do we run into "initialization" of the "namespace" issue . Even after that if it continues, please look at the specific regionserver logs where the "namespace" region is getting assigned to see what issues/ errors that you would encounter.

Online	Offline
Last Visited	‎09-06-2024 10:16 AM

Member Since	‎04-20-2016 12:41 PM
Last Visited	‎09-06-2024 10:16 AM
Posts	86
Kudos received	27

Cloudera Community

Re: Unable to connect spark python with phoenix in...

Re: Error when trying HBase CopyTable across two K...

Re: HDP 2.5 : Oozie issue

Re: Hadoop security Failed

Re: HBase master start and shutdown after 5min

Namenode crashes with SEGFAULT when using JniBased...

Adding a new host via the Ambari results in a "org...

Unable to view Tez View - (error code: 500, messag...

Unable to initialize hive / run a job due to "non-...

Re: Cannot initiate connection as SYSTEM:CATALOG i...

Re: Hadoop security Failed

Re: Cannot initiate connection as SYSTEM:CATALOG i...

Re: While loading the data from external hive tabl...

Re: Hadoop security Failed

Re: HBase master start and shutdown after 5min