Member since
04-20-2016
86
Posts
27
Kudos Received
7
Solutions
10-17-2018
12:11 PM
1 Kudo
Great article !!!
... View more
01-01-2017
07:37 PM
SYMPTOM
Namenode crash might be observed when jni based unix group mappings is enabled. The crash will usually generate the "hs_err" log file which will have the stack as below:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fbc814dd2a0, pid=380582, tid=140448021370624
#
# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libnss_uxauth.so.2+0x4e2a0] sqlite3ExprCodeTarget+0xcc3
#
......
Stack: [0x00007fbc9a5c5000,0x00007fbc9a6c6000], sp=0x00007fbc9a6c2860, free space=1014k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libnss_uxauth.so.2+0x4e2a0] sqlite3ExprCodeTarget+0xcc3
C [libnss_uxauth.so.2+0x4e8db] evalConstExpr+0xf7
C [libnss_uxauth.so.2+0x47ae2] sqlite3WalkExpr+0x34
C [libnss_uxauth.so.2+0x47bdd] sqlite3WalkExprList+0x42
C [libnss_uxauth.so.2+0x47b80] sqlite3WalkExpr+0xd2
C [libnss_uxauth.so.2+0x47b15] sqlite3WalkExpr+0x67
C [libnss_uxauth.so.2+0x4e980] sqlite3ExprCodeConstants+0x5a
C [libnss_uxauth.so.2+0x7cac1] sqlite3WhereBegin+0x1c5
C [libnss_uxauth.so.2+0x6ecc6] sqlite3Select+0x858
C [libnss_uxauth.so.2+0x7ea58] yy_reduce+0x86f
C [libnss_uxauth.so.2+0x80f7c] sqlite3Parser+0xc8
C [libnss_uxauth.so.2+0x81d0d] sqlite3RunParser+0x28b
C [libnss_uxauth.so.2+0x677d2] sqlite3Prepare+0x206
C [libnss_uxauth.so.2+0x67ab1] sqlite3LockAndPrepare+0x84
C [libnss_uxauth.so.2+0x67c53] sqlite3_prepare_v2+0x4d
C [libnss_uxauth.so.2+0xad31] init_usergroups+0x182
C [libnss_uxauth.so.2+0x914c] uxauth_initgroups+0x69
C [libnss_uxauth.so.2+0xc9a0] _nss_uxauth_initgroups_dyn+0x88
C [libc.so.6+0xa979f] __tls_get_addr@@GLIBC_2.3+0xa979f
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j org.apache.hadoop.security.JniBasedUnixGroupsMapping.getGroupsForUser(Ljava/lang/String;)[Ljava/lang/String;+0
j org.apache.hadoop.security.JniBasedUnixGroupsMapping.getGroups(Ljava/lang/String;)Ljava/util/List;+6
j org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(Ljava/lang/String;)Ljava/util/List;+5
j org.apache.hadoop.security.Groups$GroupCacheLoader.fetchGroupList(Ljava/lang/String;)Ljava/util/List;+19
j org.apache.hadoop.security.Groups$GroupCacheLoader.load(Ljava/lang/String;)Ljava/util/List;+2
j org.apache.hadoop.security.Groups$GroupCacheLoader.load(Ljava/lang/Object;)Ljava/lang/Object;+5
j com.google.common.cache.CacheLoader.reload(Ljava/lang/Object;Ljava/lang/Object;)Lcom/google/common/util/concurrent/ListenableFuture;+2
ROOT CAUSE: We have couple of Apache JIRA's which are reported that track this issue .
https://issues.apache.org/jira/browse/HADOOP-10442
https://issues.apache.org/jira/browse/HADOOP-10527
WORK AROUND: As a workaround we can change the JNI based mappings to shell based mapping by changing hadoop.security.group.mapping property through Ambari under "Advanced core-site" or in core-site.xml on namenode server. A HDFS restart would be required for this change to take effect.
<property>
<name>hadoop.security.group.mapping</name>
<value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value>
</property>
... View more
Labels:
01-01-2017
02:12 PM
SYMPTOM: While adding a new host via the Ambari , results in an exception as below: #####
11 Apr 2016 10:52:24,629 ERROR [qtp-client-81] AbstractResourceProvider:279 - Caught AmbariException when creating a res
ource
org.apache.ambari.server.HostNotFoundException: Host not found, hostname=hostname_123.abc.xyz.com
at org.apache.ambari.server.state.cluster.ClustersImpl.getHost(ClustersImpl.java:343)
at org.apache.ambari.server.state.ConfigHelper.getEffectiveDesiredTags(ConfigHelper.java:108)
at org.apache.ambari.server.controller.AmbariManagementControllerImpl.findConfigurationTagsWithOverrides(AmbariM
anagementControllerImpl.java:1820)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.google.inject.internal.DelegatingInvocationHandler.invoke(DelegatingInvocationHandler.java:37)
at com.sun.proxy.$Proxy82.findConfigurationTagsWithOverrides(Unknown Source)
at org.apache.ambari.server.controller.AmbariActionExecutionHelper.addExecutionCommandsToStage(AmbariActionExecu
tionHelper.java:372)
at org.apache.ambari.server.controller.AmbariManagementControllerImpl.createAction(AmbariManagementControllerImp
l.java:3366)
at org.apache.ambari.server.controller.internal.RequestResourceProvider$1.invoke(RequestResourceProvider.java:16
5)
at org.apache.ambari.server.controller.internal.RequestResourceProvider$1.invoke(RequestResourceProvider.java:16
2)
at org.apache.ambari.server.controller.internal.AbstractResourceProvider.createResources(AbstractResourceProvide
r.java:272)
at org.apache.ambari.server.controller.internal.RequestResourceProvider.createResources(RequestResourceProvider.
java:162)
at org.apache.ambari.server.controller.internal.ClusterControllerImpl.createResources(ClusterControllerImpl.java
:289)
at org.apache.ambari.server.api.services.persistence.PersistenceManagerImpl.create(PersistenceManagerImpl.java:7
6)
at org.apache.ambari.server.api.handlers.CreateHandler.persist(CreateHandler.java:36)
at org.apache.ambari.server.api.handlers.BaseManagementHandler.handleRequest(BaseManagementHandler.java:72)
at org.apache.ambari.server.api.services.BaseRequest.process(BaseRequest.java:135)
at org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:105)
at org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:74)
at org.apache.ambari.server.api.services.RequestService.createRequests(RequestService.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
###########
ROOT CAUSE: The issue usually happens when there is a conflicting entry in the /etc/hosts for the node which we are trying to add. This could be due to the fact that the hostname entry in the /etc/hosts file could be "incorrectly" resolving out to an "incorrect" IP address or vise versa. The Ambari agent uses the script "/usr/lib/python2.6/site-packages/ambari_agent/hostname.py" to push the updates to the ambari DB / server. The script specifically calls out / updates the hostname based on the code below: #######
try:
scriptname = config.get('agent', 'hostname_script')
try:
osStat = subprocess.Popen([scriptname], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = osStat.communicate()
if (0 == osStat.returncode and 0 != len(out.strip())):
cached_hostname = out.strip()
else:
cached_hostname = socket.getfqdn()
except:
cached_hostname = socket.getfqdn()
except:
cached_hostname = socket.getfqdn()
cached_hostname = cached_hostname.lower()
return cached_hostname
#####
Here "socket.getfqdn()" will always look up for the /etc/hosts and update the "cached_hostname" which is then pushed out to the Ambari DB here.
As a simple check we can do a following check from the host itself which we have trouble adding to determine if the "socket.getfqd()" results in the right hostname as done below: ####
[root@sandbox ~]# python
Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> print socket.getfqdn()
sandbox.hortonworks.com
>>>
##### The output printed by the "print socket.getfqdn()" should match to the entry that is captured in the /etc/hosts file.
SOLUTION:
Update the /etc/hosts entry to fix the incorrect entry and add the host back through Ambari again which then installs the Ambari agent back and updates the right entry in the Ambari DB
... View more
Labels:
01-01-2017
01:27 AM
SYMPTOMS:
Cannot access Tez View. On accessing Tez View, we get the below error:
error code: 500, message: Internal Server Error{"message":"Failed to fetch results by the proxy from url: http://<hostname>:8188/ws/v1/timeline/TEZ_DAG_ID?limit=11&_=1470838899327&...Error.","status":500,"trace":"Connection refused"}
ROOT CAUSE:
The issue is due to "YARN ResourceManager URL" pointing out to the standby RM node when you have a RM HA configured. This is usually a configuration change under "Cluster Configuration" while setting up a TEZ view where the user updates opts for the "Custom" option and includes only a single Resource Manager address in "YARN ResourceManager URL" configuration.
SOLUTION:
In case of RM HA configuration , the "YARN ResourceManager URL" should point out to a both the Resource Manager addresses which should be in a semi-colon separated format (e.g. sumeshnode1.openstacklocal:8088;sumeshnode3.openstacklocal:8088)
... View more
Labels:
01-01-2017
01:25 AM
SYMPTOM:
On invoking a mapreduce job or invoking a hive shell (incase where we use tez execution engine, which is default) which involves spinning up a mapreduce container on the mapreduce queue), we get the below error :
[hive@sumeshhdpn2 root]$ hadoop jar /usr/hdp/2.2.4.2-2/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 1 10
Number of Maps = 1
Samples per Map = 10
Wrote input for Map #0
Starting Job
16/08/15 04:12:47 INFO impl.TimelineClientImpl: Timeline service address: http://sumeshhdpn2:8188/ws/v1/timeline/
16/08/15 04:12:47 INFO client.RMProxy: Connecting to ResourceManager at sumeshhdpn2/172.25.16.48:8050
16/08/15 04:12:48 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 129 for hive on ha-hdfs:sumeshhdp
16/08/15 04:12:48 INFO security.TokenCache: Got dt for hdfs://sumeshhdp; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sumeshhdp, Ident: (HDFS_DELEGATION_TOKEN token 129 for hive)
16/08/15 04:12:53 INFO input.FileInputFormat: Total input paths to process : 1
16/08/15 04:12:56 INFO mapreduce.JobSubmitter: number of splits:1
16/08/15 04:12:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470233265007_0006
16/08/15 04:12:58 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sumeshhdp, Ident: (HDFS_DELEGATION_TOKEN token 129 for hive)
16/08/15 04:13:03 INFO impl.YarnClientImpl: Submitted application application_1470233265007_0006
16/08/15 04:13:04 INFO mapreduce.Job: The url to track the job: http://sumeshhdpn2:8088/proxy/application_1470233265007_0006/
16/08/15 04:13:04 INFO mapreduce.Job: Running job: job_1470233265007_0006
16/08/15 04:13:04 INFO mapreduce.Job: Job job_1470233265007_0006 running in uber mode : false
16/08/15 04:13:04 INFO mapreduce.Job: map 0% reduce 0%
16/08/15 04:13:04 INFO mapreduce.Job: Job job_1470233265007_0006 failed with state FAILED due to: Application application_1470233265007_0006 submitted by user hive to non-leaf queue: default
16/08/15 04:13:04 INFO mapreduce.Job: Counters: 0
Job Finished in 17.775 seconds
[hive@sumeshhdpn3 root]$ hive
Logging initialized using configuration in file:/etc/hive/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hive/lib/hive-jdbc-0.14.0.2.2.4.2-2-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1470233265007_0007 submitted by user hive to non-leaf queue: default
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:457)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1470233265007_0007 submitted by user hive to non-leaf queue: default
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:612)
at org.apache.tez.client.TezClient.preWarm(TezClient.java:585)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:200)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:122)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:454)
... 8 more
ROOT CAUSE:
This is triggered when requestor / user create a queue under the "default" queue. The queue "default" should not have any child queue created under. An example of this would be as below:
Here we have a queue called "default.test" queue created under "default" queue. Due to this when we submit any mapreduce job or hive shell, we get the exceptions as shown above.
RESOLUTION / WORKAROUND:
To address this issue, remove the queue under the default queue. If a queue needs to be created, we can create a queue under "root" and assign the required resources for it.
... View more
Labels:
12-28-2016
04:19 PM
2 Kudos
ISSUE While trying to connect to the HBase cluster from an edge node or a client HBase API we get the expection "Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations "
SYMPTOM
The exact stack that we encounter is as below:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:312)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:151)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isTableAvailable(ConnectionManager.java:991)
at org.apache.hadoop.hbase.client.HBaseAdmin.isTableAvailable(HBaseAdmin.java:1400)
at org.apache.hadoop.hbase.client.HBaseAdmin.isTableAvailable(HBaseAdmin.java:1408)
at Table.main(Table.java:15)
ROOT CAUSE This happens when user has an incorrect value defined for "zookeeper.znode.parent" in the hbase-site.xml sourced on the client side or in case of a custom API written , the "zookeeper.znode.parent" was incorrectly updated to a wrong location . For example the default "zookeeper.znode.parent" is set to "/hbase-unsecure" , but if you incorrectly specify that as lets say "/hbase" as opposed to what we have set up in the cluster, we will encounter this exception while trying to connect to the HBase cluster.
RESOLUTION The solution here would be to update the hbase-site.xml / source out the same hbase-site.xml from the cluster or update the HBase API to correctly point out the "zookeeper.znode.parent" value as updated in the HBase cluster.
... View more
Labels:
12-28-2016
04:12 PM
1 Kudo
PROBLEM:
1. Create a source external hive table as below:
CREATE EXTERNAL TABLE `casesclosed`(
`number` int,
`manager` string,
`owner` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://sumeshhdp/tmp/casesclosed'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'totalSize'='3693',
'transient_lastDdlTime'='1478557456')
2. Create an ORC table with CTAS from the source table as below:
CREATE TABLE casesclosed_mod
STORED AS ORC tblproperties("orc.compress"="ZLIB", "orc.compress.size"="8192")
AS
SELECT
cast(number as int) as number,
cast(manager as varchar(40)) as manager,
cast(owner as varchar(40)) as owner
FROM cases closed;
3. On creating the Spark DataFrame against both non-orc table ( source ) and the orc table, we are unable to list out the column names in the ORC table :
scala> val df = sqlContext.table("default.casesclosed")
df: org.apache.spark.sql.DataFrame = number: int, manager: string, owner: string
scala> val df = sqlContext.table("default.casesclosed_mod")
16/11/07 22:41:48 INFO OrcRelation: Listing hdfs://sumeshhdp/apps/hive/warehouse/casesclosed_mod on driver
df: org.apache.spark.sql.DataFrame = _col0: int, _col1: string, _col2: string
3. On creating the Spark DataFrame against both non-orc table ( source ) and the orc table, we are unable to list out the column names in the ORC table :
scala> val df = sqlContext.table("default.casesclosed")
df: org.apache.spark.sql.DataFrame = number: int, manager: string, owner: string
scala> val df = sqlContext.table("default.casesclosed_mod")
16/11/07 22:41:48 INFO OrcRelation: Listing hdfs://sumeshhdp/apps/hive/warehouse/casesclosed_mod on driver
df: org.apache.spark.sql.DataFrame = _col0: int, _col1: string, _col2: string
TWO WORKAROUNDS:
- Use Spark to create the tables instead of Hive.
- Set: sqlContext.setConf("spark.sql.hive.convertMetastoreOrc", "false")
ROOT CAUSE:
The table "casesclosed_mod" is "STORED AS ORC tblproperties("orc.compress"="ZLIB", "orc.compress.size"="8192")". Spark supports ORC data source format internally, and has its own logic/ method to deal with ORC format, which is different from Hive's. So in this bug, Spark can not "understand" the format of the ORC file created by Hive. In Hive, if create a table "casesclosed_mod" without "STORED AS ORC tblproperties("orc.compress"="ZLIB", "orc.compress.size"="8192")", everything works fine. In Hive:
hive> CREATE TABLE casesclosed_mod0007
> AS
> SELECT
> cast(number as int) as number,
> cast(manager as varchar(40)) as manager,
> cast(owner as varchar(40)) as owner
> FROM casesclosed007;
In Spark-shell:
scala> val df = sqlContext.table("casesclosed_mod0007") ;
df: org.apache.spark.sql.DataFrame = [number: int, manager: string, owner: string]
This is a known bug which is tracked through the Apache Bug :
https://issues.apache.org/jira/browse/SPARK-16628
... View more
Labels: