Member since
03-11-2016
36
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1501 | 10-06-2016 03:26 PM |
09-16-2020
03:27 PM
I believe this will fail if you stop your job today and run it tomorrow.. now will change to other day and you will miss the data...
... View more
04-09-2018
08:57 AM
@Geoffrey Shelton Okot ..Thanks for the update. It worked, also want to add one thing that one of my namenode port was occupied by previous running instance[java.net.BindException: Port in use: 0.0.0.0:50070]and the Ambari was not showing any message for that , so checked my namenode logs on the server itself. After killing the old PID and restart did the trick.
... View more
04-07-2018
08:40 PM
I have installed MIT kerberos on one Linux server and through Ambari's automated way we tried to kerberise our dev cluster.
Amabri created all the principals for each node[3 datanode,2namenode and one edge node] and i can see them in KDC.
While starting all services on last step it failed , Namenode services are not coming up.
Before proceeding this on our dev cluster I have done same activities on Sandbox and it worked. But on cluster there is a slight change,it is HA cluster and for each node we have two IP's , one is external on which we can do ssh and login and other is internal IP for each node for internal communication through infiniband. NAMENODE ERROR MSG:- 2018-04-01 16:19:26,580 - call['hdfs haadmin -ns ABCHADOOP01 -getServiceState nn2'] {'logoutput': True, 'user': 'hdfs'}
18/04/01 16:19:28 INFO ipc.Client: Retrying connect to server: c1master02-nn.abc.corp/29.6.6.17:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From c1master01-nn.abc.corp/29.6.6.16 to c1master02-nn.abc.corp:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2018-04-01 16:19:28,783 - call returned (255, '18/04/01 16:19:28 INFO ipc.Client: Retrying connect to server: c1master02-nn.abc.corp/29.6.6.16:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)\nOperation failed: Call From c1master01-nn.abc.corp/29.6.6.16 to c1master02-nn.abc.corp:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused')
2018-04-01 16:19:28,783 - NameNode HA states: active_namenodes = [], standby_namenodes = [], unknown_namenodes = [('nn1', 'c1master01-nn.abc.corp:50070'), ('nn2', 'c1master02-nn.abc.corp:50070')]
2018-04-01 16:19:28,783 - Will retry 2 time(s), caught exception: No active NameNode was found.. Sleeping for 5 sec(s)
2018-04-01 16:19:33,787 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl --negotiate -u : -s '"'"'http://c1master01-nn.abc.corp:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem'"'"' 1>/tmp/tmpKVcTXy 2>/tmp/tmpy6hgoj''] {'quiet': False}
2018-04-01 16:19:33,837 - call returned (7, '')
2018-04-01 16:19:33,837 - Getting jmx metrics from NN failed. URL: http://c1master01-nn.abc.corp:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py", line 38, in get_value_from_jmx
_, data, _ = get_user_call_output(cmd, user=run_user, quiet=False)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py", line 61, in get_user_call_output
raise ExecutionFailed(err_msg, code, files_output[0], files_output[1])
ExecutionFailed: Execution of 'curl --negotiate -u : -s 'http://c1master01-nn.abc.corp:50070/jmx?qry=Hadoop:service=NameNode,name=FSNamesystem' 1>/tmp/tmpKVcTXy 2>/tmp/tmpy6hgoj' returned 7.
2018-04-01 16:19:33,837 - call['hdfs haadmin -ns ABCHADOOP01 -getServiceState nn1'] {'logoutput': True, 'user': 'hdfs'}
Command failed after 1 tries
Do not show this dialog again when starting a background operationOK
Licensed under the Apache License, Version 2.0.
See third-party tools/resources that Ambari uses and their respective authors
-From each node i am able to do kadmin and add list princs.
-I have done ssh on Namenode and tried to obtain ticket , it also worked.
abc># kinit -kt /etc/security/keytabs/nn.service.keytab nn/c1master01-nn.abc.corp@ABCHDP.COM
abc># klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: nn/c1master01-nn.abc.corp@ABCHDP.COM
Valid starting Expires Service principal
04/01/18 16:03:42 04/02/18 16:03:42 krbtgt/ABCHDP.COM@ABCHDP.COM
renew until 04/01/18 16:03:42 Since the cluster is empty and tried hadoop namenode -format as well
But got below issue:- java.io.IOException: Login failure for nn/c1master01-nn.abc.corp@ABCHDP.COM from keytab /etc/security/keytabs/nn.service.keytab: javax.security.auth.login.LoginException: Receive timed out
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1098)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:307)
at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1160)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1631)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1769)
Caused by: javax.security.auth.login.LoginException: Receive timed out
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:808)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1089)
... 4 more
Caused by: java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:143)
at java.net.DatagramSocket.receive(DatagramSocket.java:812)
at sun.security.krb5.internal.UDPClient.receive(NetClient.java:206)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:411)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:364)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.krb5.KdcComm.send(KdcComm.java:348)
at sun.security.krb5.KdcComm.sendIfPossible(KdcComm.java:253)
at sun.security.krb5.KdcComm.send(KdcComm.java:229)
at sun.security.krb5.KdcComm.send(KdcComm.java:200)
at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:316)
at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:361)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:776)
... 17 more
18/04/01 15:45:03 INFO util.ExitUtil: Exiting with status 1
18/04/01 15:45:03 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at c1master01-nn.abc.corp/29.6.6.17 This 29.6.6.17 is the internal IP .
Can anybody tell me whats the issue?? Do i need to manually add entry for internal IP's in KDC ??If required why Amabri haven't added it to KDC like it does for external ips?? In case required , since every machine is having only one hostname , why we need two entries??
... View more
Labels:
- Labels:
-
Apache Ambari
03-05-2018
07:53 PM
@Geoffrey Shelton Okot Can i use open ldap instead of AD , i mean create users and groups in openldap and use it as backend for Kerberos?? Is it good practice?
... View more
09-20-2017
06:58 AM
@Rajesh... Thanks ...It is working for beeline. Since it is a bug in Knox , can we upgrade from KNOX 0.9 to Knox 0.12 for HDP2.5.3 ? is there any document for that as i was not able to find any doc for upgrading knox ?
... View more
09-19-2017
08:37 PM
We are querying HS2 using knox through beeline and also other jdbc tool and getting frequent disconnection.
Below is the url for connection thorugh beeline:- jdbc:hive2://c3master03-nn.abc.org:8445/;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=gateway/default/hive
After connection if i do not query for a minute then i got below error(Same with Squirrel JDBC client as well) Getting log thread is interrupted, since query is done!
Error: org.apache.thrift.transport.TTransportException: org.apache.http.NoHttpResponseException: c3master03-nn.abc.org:8445 failed to respond (state=08S01,code=0)
java.sql.SQLException: org.apache.thrift.transport.TTransportException: org.apache.http.NoHttpResponseException: c3master03-nn.abc.org:8445 failed to respond
at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:305)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:238)
at org.apache.hive.beeline.Commands.execute(Commands.java:863)
at org.apache.hive.beeline.Commands.sql(Commands.java:728)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:993)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:833)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:791)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:491)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:474)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: org.apache.thrift.transport.TTransportException: org.apache.http.NoHttpResponseException: c3master03-nn.abc.org:8445 failed to respond
at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:297)
at org.apache.thrift.transport.THttpClient.flush(THttpClient.java:313)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:73)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62)
at org.apache.hive.service.cli.thrift.TCLIService$Client.send_ExecuteStatement(TCLIService.java:223)
at org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:215)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1363)
at com.sun.proxy.$Proxy0.ExecuteStatement(Unknown Source)
at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:296)
... 14 more
Caused by: org.apache.http.NoHttpResponseException: c3master03-nn.abc.org:8445 failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:84)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:117)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.thrift.transport.THttpClient.flushUsingHttpClient(THttpClient.java:251)
... 26 more Even After this exception , if i rerun the query on same beeline window , it will get executed and show me the result. And after the wait for 1 minute if i execute the same query or any other query , same exception for one time and on rerun result is there...What is this weird behavior.. Even the below properties have sufficient values.
hive.server2.session.check.interval hive.server2.idle.operation.timeout hive.server2.idle.session.timeout Can some one help what is the issue or configuration changes required...
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Knox
02-22-2017
07:49 PM
@spolavarapu..I did lit bit googling an fixed it..but in ranger while creating policy i have selected one LDAP group ..so idealy only the users of these group should come in 'Select User' tab..but i can see all users there..
... View more
02-22-2017
02:12 AM
@Vipin Rathor..the above search was fine...i believe in HDP 2.3.2 groupsync filters was false by default..which was the issue..
... View more
02-22-2017
02:10 AM
@spolavarapu Thanks ..when i enabled filters from false to true..it picked the groups, but all of them are internal..also i have downloaded 2.5 sandbox , i was able to get the groups as these filters were already enabled there but there i am not able to login using the passwords in HDP 2.5 . It says invalid username/password .Can you give quick pointers to check for that..
... View more
02-20-2017
02:39 AM
I have hdp 2.3.2 and ranger 0.5 and openldap . I am intgrating ldap and ranger .
I have configured ranger and able to see the users in users tab but my groups are not visible in
the RangerUI . Below in the LDIF from open ldap. Sample LDIF
# bigdatdomain.com
dn: dc=bigdatdomain,dc=com
objectClass: organization
objectClass: dcObject
o: Hadoop
dc: bigdatdomain
# users, bigdatdomain.com
dn: ou=users,dc=bigdatdomain,dc=com
objectClass: organizationalUnit
ou: users
# student1, users, bigdatdomain.com
dn: uid=student1,ou=users,dc=bigdatdomain,dc=com
uid: student1
cn: student1
sn: 1
objectClass: top
objectClass: posixAccount
objectClass: inetOrgPerson
loginShell: /bin/bash
homeDirectory: /home/student1
uidNumber: 15000
gidNumber: 10000
userPassword:: e1NTSEF9Q1FHNUtIYzZiMWlpK3FvcGFWQ3NOYTE0djkrcjE0cjU=
mail: student1@bigdatdomain.com
gecos: Student1 User
# student2, users, bigdatdomain.com
dn: uid=student2,ou=users,dc=bigdatdomain,dc=com
uid: student2
cn: student2
sn: 2
objectClass: top
objectClass: posixAccount
objectClass: inetOrgPerson
loginShell: /bin/bash
homeDirectory: /home/student2
uidNumber: 15001
gidNumber: 10000
userPassword:: e1NTSEF9Q1FHNUtIYzZiMWlpK3FvcGFWQ3NOYTE0djkrcjE0cjU=
mail: student2@bigdatdomain.com
gecos: Student2 User
# groups, bigdatdomain.com
dn: ou=groups,dc=bigdatdomain,dc=com
objectClass: top
objectClass: organizationalUnit
ou: groups
description: stc groups
# itpeople, groups, bigdatdomain.com
dn: cn=itpeople,ou=groups,dc=bigdatdomain,dc=com
objectClass: groupOfNames
member: uid=student2,ou=users,dc=bigdatdomain,dc=com
member: uid=student1,ou=users,dc=bigdatdomain,dc=com
cn: itpeople
description: IT security group Usersync log :- 20 Feb 2017 00:00:55 INFO LdapUserGroupBuilder [UnixUserSyncThread] - LdapUserGroupBuilder initialization completed with -- ldapUrl: ldap://xyz:389, ldapBindDn: cn=Manager,dc=bigdatdomain,dc=com, ldapBindPassword: ***** , ldapAuthenticationMechanism: simple, searchBase: dc=bigdatdomain,dc=com, userSearchBase: ou=users,dc=bigdatdomain,dc=com, userSearchScope: 2, userObjectClass: person, userSearchFilter: uid=*, extendedUserSearchFilter: (&(objectclass=person)(uid=*)), userNameAttribute: uid, userSearchAttributes: [uid, ismemberof, memberof], userGroupNameAttributeSet: [ismemberof, memberof], pagedResultsEnabled: true, pagedResultsSize: 500, groupSearchEnabled: false, groupSearchBase: dc=bigdatdomain,dc=com, groupSearchScope: 2, groupObjectClass: groupofnames, groupSearchFilter: , extendedGroupSearchFilter: (&(objectclass=groupofnames)(member={0})), extendedAllGroupsSearchFilter: (&(objectclass=groupofnames)), groupMemberAttributeName: member, groupNameAttribute: cn, groupUserMapSyncEnabled: false, ldapReferral: ignore Can some point that if there is any error in my ranger conf??
... View more
Labels:
- Labels:
-
Apache Ranger