Support Questions

Find answers, ask questions, and share your expertise

Performance issue: Hive + Kerberos

avatar
Contributor

Hi,

We're running hiveserver2 in a kerberized cluster. The hiveserver2 processes run on edge nodes. The HDP version is 2.3.0.0-2557, the hive version is 1.2.1.2.3.

We've got an automatic oozie job (written in java), which executes a hive query. The performance of the job has become much worse after setting up Kerberos in the cluster (at least our developers say so).

The job uses the hive jdbc driver to connect to hiveserver2 in the HTTP transport mode, we also use the ZooKeeper service discovery.

I am new to this specific HDP cluster and I am trying to understand what is going on.

When the job is running, in the hiveserver2.log I see lot's of errors like below:

2016-06-06 08:46:07,011 INFO  [HiveServer2-HttpHandler-Pool: Thread-440]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(169)) - Cookie added for clientUserName oozie
2016-06-06 08:46:07,038 INFO  [HiveServer2-HttpHandler-Pool: Thread-440]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(127)) - Could not validate cookie sent, will try to generate a new cookie
2016-06-06 08:46:07,039 INFO  [HiveServer2-HttpHandler-Pool: Thread-440]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doKerberosAuth(352)) - Failed to authenticate with http/_HOST kerberos principal, trying with hive/_HOST kerberos principal

The number of above entries for each job matches more or less the number of fetches (row_count/50). I understand that for some reason the cookie authentication doesn't work properly. Moreover something is wrong with Kerberos authentication.

On the KDC server, in the krb5kdc.log I see hundreds of thousands entries like below:

Jun 07 13:37:06 kdcsrv01 krb5kdc[5469](info): TGS_REQ (4 etypes {18 17 16 23}) 10.141.5.25: ISSUE: authtime 1465306605, etypes {rep=18 tkt=18 ses=18}, oozie@BDATA.COM for hive/edge01@BDATA.COM

I re-executed a query (which is normally run by the job) in beeline. It triggers exactly the same problem in hiveserver2.log as described above. I tried both: cookieAuth=true and false. I have limited access to the KDC machine and can't confirm now if the same issue is observed in krb5kdc.log in this case.

Any ideas how to proceed with the investigation will be appreciated.

Regards,

Pit

1 ACCEPTED SOLUTION

avatar
Contributor

We've managed to solve the problem. Deeper examination of the JDBC communication between the client and the hive server shows that the cookie authentication mechanism, which should prevent subsequent authentication calls within a single session requires the http server with SSL.

Solution:

Either of the following resolves the issue:

  • Enable SSL for hiveserver2 in http transport mode for the default configuration of the service.
  • If you don’t need SSL, disable the requirement for secure cookies. Set the parameter hive.server2.thrift.http.cookie.is.secure=false in hiveserver2-site.xml.

Note: the hiveserver2 documentation lacks detailed information about the cookie authentication mechanism. Only code and component debugging/tracing may shed some light on the investigation.

View solution in original post

5 REPLIES 5

avatar
Master Guru

@Pit Err consider the following:

  • You may be pounding the Authentication Service. Check the volumes on the AD domain controllers. You may need to have dedicated controllers.
  • Network distance. Is the KDC within same data center? are you experiencing network lag?
  • Try with local KDC. If local KDC performs faster then you know the above 2 issues may defintialy be in play

avatar
Expert Contributor

@Pit Err Try having your developers call setFetchSize on the ResultSet. The default is 50 rows fetched at a time. Setting this to a higher value will decrease the number of KDC requests for authentication each time the next set of rows of the results are fetched from Hiveserver2.

avatar
Contributor

Thanks for the answers.

@Sunile Manjee - we have a dedicated KDC which responds quite quickly, no network issues so far.

@Terry Stebbens - we've already bumped the fetch size and that helped a lot.

I am wondering if that's ok that each fetch is authenticated.

avatar
Contributor

I managed to isolate the issue a bit. Looks that it's not related to oozie jobs but can be observed in java code connecting to hiveserver2 over JDBC.

I wrapped a query in some java code and run in my test environment. In the code I make a jdbc connection to hiveserver2. Before the connection is established, I make kerberos authentication. I wrote two separate apps, one uses the Java GSS API, the other one authenticates using UserGroupInformation.loginUserFromKeytab.

When the hiveserver2 is in the binary transport mode, both apps perform well.

When the hiveserver2 is in the HTTP transport mode, the job which uses Java GSS API calls my KDC before each fetch operation. I run the app with JGSS debugging. Before each fetch the following log is printed:

Search Subject for Kerberos V5 INIT cred (<<DEF>>, sun.security.jgss.krb5.Krb5InitCredential)
Found ticket for piterr@MYIPADOMAIN.BEETLE.INT to go to krbtgt/MYIPADOMAIN.BEETLE.INT@MYIPADOMAIN.BEETLE.INT expiring on Fri Jun 17 09:02:29 CEST 2016
Entered Krb5Context.initSecContext with state=STATE_NEW
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
getKDCFromDNS using UDP
>>> KrbKdcReq send: kdc=fipasrv.beetle.int. UDP:88, timeout=30000, number of retries =3, #bytes=726
>>> KDCCommunication: kdc=fipasrv.beetle.int. UDP:88, timeout=30000,Attempt =1, #bytes=726
>>> KrbKdcReq send: #bytes read=706
>>> KdcAccessibility: remove fipasrv.beetle.int.:88
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbApReq: APOptions are 00000000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting mySeqNumber to: 879752450
Krb5Context setting peerSeqNumber to: 0
Created InitSecContextToken:
0000: 01 00 6E 82 02 69 30 82   02 65 A0 03 02 01 05 A1  ..n..i0..e......
0010: 03 02 01 0E A2 07 03 05   00 00 00 00 00 A3 82 01  ................
0020: 72 61 82 01 6E 30 82 01   6A A0 03 02 01 05 A1 18  ra..n0..j.......
0030: 1B 16 4D 59 49 50 41 44   4F 4D 41 49 4E 2E 42 45  ..MYIPADOMAIN.BE
0040: 45 54 4C 45 2E 49 4E 54   A2 26 30 24 A0 03 02 01  ETLE.INT.&0$....
....

When I switch back to the binary transport mode, everything works smoothly.

avatar
Contributor

We've managed to solve the problem. Deeper examination of the JDBC communication between the client and the hive server shows that the cookie authentication mechanism, which should prevent subsequent authentication calls within a single session requires the http server with SSL.

Solution:

Either of the following resolves the issue:

  • Enable SSL for hiveserver2 in http transport mode for the default configuration of the service.
  • If you don’t need SSL, disable the requirement for secure cookies. Set the parameter hive.server2.thrift.http.cookie.is.secure=false in hiveserver2-site.xml.

Note: the hiveserver2 documentation lacks detailed information about the cookie authentication mechanism. Only code and component debugging/tracing may shed some light on the investigation.