Created 03-23-2017 08:34 PM
We are getting read timeout on tez view.
This started only after enabling Kerberos on the cluster. Exact same setup on different cluster works.
1. All proxy settings are in place in core-site.xml.
2. We changed the timeouts:
views.ambari.request.read.timeout.millis=10000
to
views.ambari.request.read.timeout.millis=60000
3. The timeline server db is in MBs. Not so huge that there will be read timeout.
4. Configurations have view pointed to local cluster.
5. ATS version is 1.0. Although the other cluster also has same version and view loads perfectly fine there.
6. Timeline server logs just show:
10:42:15,127 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(528)) - AuthenticationToken ignored: AuthenticationToken expired 10:42:15,130 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(528)) - AuthenticationToken ignored: AuthenticationToken expired 10:42:15,130 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(528)) - AuthenticationToken ignored: AuthenticationToken expired 10:42:15,131 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(528)) - AuthenticationToken ignored: AuthenticationToken expired 10:42:15,133 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(528)) - AuthenticationToken ignored: AuthenticationToken expired 10:42:15,133 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(528)) - AuthenticationToken ignored: AuthenticationToken expired 10:42:15,135 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(528)) - AuthenticationToken ignored: AuthenticationToken expired 10:42:15,136 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(528)) - AuthenticationToken ignored: AuthenticationToken expired 10:42:15,136 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(528)) - AuthenticationToken ignored: AuthenticationToken expired
7. Curl command authenticates, but no response from the ATS:
curl -i --negotiate -u : http://<TIMELINE-SERVER-HOST>:8188/ws/v1/timeline/TEZ_DAG_ID?limit=11&_=1489546472753&
ProxyHelper:71 - Cannot access the url: http://<TIMELINE-SERVER-HOST>:8188/ws/v1/timeline/TEZ_DAG_ID?limit=11&_=1489546472753& java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at java.io.BufferedInputStream.read(BufferedInputStream.java:345) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647) at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1535) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480) at org.apache.ambari.view.tez.utils.ProxyHelper.getResponse(ProxyHelper.java:57) at org.apache.ambari.view.tez.rest.BaseProxyResource.getData(BaseProxyResource.java:55) at sun.reflect.GeneratedMethodAccessor573.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
Created 03-24-2017 12:20 AM
Do we have the following settings in Yarn.
yarn.admin.acl = yarn,dr.who,admin yarn.acl.enable = false yarn.timeline-service.http-authentication.proxyuser.<AMBARI_PRINCIPAL>.hosts = * yarn.timeline-service.http-authentication.proxyuser.<AMBARI_PRINCIPAL>.users = * or yarn.timeline-service.http-authentication.proxyusers.*.hosts=* yarn.timeline-service.http-authentication.proxyusers.*.users=* yarn.timeline-service.http-authentication.proxyusers.*.groups=*
.
Also can you please confirm if the YARN has HA enabled and its UIs are kerberised ?
Created 03-24-2017 02:55 AM
hey yep.. 😞
Created 03-24-2017 03:22 AM
Usually the following is GSS error that indicates an expired ticket was used and hence command died while creating a security context (the init_sec_context call).
10:42:15,127 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(528)) - AuthenticationToken ignored: AuthenticationToken expired
So can we check manually to get the ticket and validate the same:
# kdestroy # kinit <username> # curl -i --negotiate -u : http://TIMELINE-SERVER-HOST:8188/ws/v1/timeline/TEZ_DAG_ID?limit=11&_=1489546472753& # curl -i --negotiate -u : http://TIMELINE-SERVER-HOST:8188/ws/v1/timeline/TEZ_DAG_ID?limit=11&_=1489546472753&
.
Also the output of "klist" will show us the ticket expiration information. If expiration is too quick.
# klist
Created 03-30-2017 10:28 PM
So here is what we did-
We observed that, the response to curl we ran from command line took about 11 minutes to return the result from the ATS.
First we tested it with, In ambari.properties: views.ambari.request.read.timeout.millis=900000
views.request.read.timeout.millis=900000
Then view loaded after 15 mins. After switching to ATSv1.5, it should fix this as this version of ATS has a better storage capabilities.