Member since
11-15-2018
12
Posts
0
Kudos Received
0
Solutions
01-19-2021
06:28 AM
There are no preconfigured queues under the parent queue. That checkbox seems to be tied to placement rules. However, for CDP, the only placement rules that allow autocreation of leafs is tied to "root.users". In CDH, you could have a leaf queue created anywhere if "create queue if non-existent" was checked. In CDP, the freedom to do that seems limited to things under root.users... Hopefully I'm missing something. The inability to create leaf queues under arbitrary parent queues would be a big step backwards in terms of queue flexibilty.
... View more
01-11-2021
10:38 AM
Hello. In CDH, the process of creating a parent pool that allowed dynamic child pools to be created at runtime was trivial -- simply mark the pool as Parent. In CDP 7 I'm running into the following issue... For a leaf pool (lets say root.testpool), I edit pool attributes in the Yarn Queue Manager UI. Under the "Dynamic Auto-Creation of Queue" section, I see.... "Is Auto Created Leaf Queue", which is unchecked and uneditable. This makes sense since "root.testpool" was not auto created. "Enable Dynamic Queue Creation' is unchecked but it also uneditable. I feel this should be editable in order to define the "root.testpool" as supporting dynamic queues below it. How do I enable dynamic queues for a given pool? Thanks, M
... View more
Labels:
06-02-2020
07:33 AM
Thanks Ferenc. As mentioned, we cannot use CDP in the foreseeable future, nor are we customizing the CDH release in any real way -- simply adding Livy on a new edge node, which should work and should be relatively straightforward, but in this case is not. Enterprise Support (which we pay for) is telling us to ask the Community for guidance. So, here we are. If anyone has any guidance in terms of actually getting past this local auth mapping issue, we would greatly appreciate it. Best, Mike
... View more
05-28-2020
01:41 PM
Thanks Bender. I appreciate the response. Just want to clarify a few things.. * Livy itself is running as livyblauser@COMPANY.PRI * mzeoli@COMPANY.PRI is the user hitting the livy Web UI. * Both livyblauser and mzeoli are AD accounts and have rights on the edge node livy is running on (nydc-pblalivy01, which is same box the HTTP service principal is for) * Both have permission to read krb5.conf (its world readable, though I'm not sure why / how something would be hitting krb5.conf as mzeoli, since mzeoli is just the web UI user and should not own any process. Or perhaps I misunderstood you) Given that this works... [livyblauser@nydc-pblalivy01 hadoop]$ hadoop org.apache.hadoop.security.HadoopKerberosName mzeoli@COMPANY.PRI
Name: mzeoli@COMPANY.PRI to mzeoli ....It really feels like Livy isn't finding the rules it would expect to find, though I see the correct rules in /etc/hadoop/conf/core-site.xml. <name>hadoop.security.auth_to_local</name>
<value>
RULE:[1:$1@$0](.*@\QBD.COMPANY.PRI\E$)s/@\QBD.COMPANY.PRI\E$//
RULE:[2:$1@$0](.*@\QBD.COMPANY.PRI\E$)s/@\QBD.COMPANY.PRI\E$//
RULE:[1:$1@$0](.*@\QCOMPANY.PRI\E$)s/@\QCOMPANY.PRI\E$//
RULE:[2:$1@$0](.*@\QCOMPANY.PRI\E$)s/@\QCOMPANY.PRI\E$//
DEFAULT
</value> Thanks, Mike
... View more
05-22-2020
10:54 AM
Hello.
We are currently trying to get a proof-of-concept off the ground using Livy as a spark-shell multiplexor in our kerberized CDH 6.1 environment (I realize Livy is formally supported in CDP 7, but our requirement atm is to get this working under 6 or create our own multiplexor to help reduce wait-time as spark shells start up). We're running into problems getting Kerberos to properly map names via the auth_to_local rules.
We've completed the following:
Deployed Livy to a clean, kerberized edge node on the cluster
Created a service principal for Livy authentication (see config below)
Created a user principal for Livy server launch (see config below)
Generated the keytabs needed to auth both of those principals
Set the ENV vars needed for Livy to find the active hadoop config / libs (see below)
When we launch Livy, we get what seems to be a clean start, and it begins listening on tcp 8998 as expected. The first time someone tries to hit that through a browser (using Chrome, and we know it's passing spnego auth to Livy) we get the following:
20/05/21 13:03:12 WARN ServletHandler: /favicon.ico
org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to mzeoli@COMPANY.PRI
at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389)
at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:377)
at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:347)
at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:518)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:539)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:745)
The complaint about no rules being applied led us to review our auth_to_local rules in core-site.xml (which is confirmed present in /etc/hadoop/conf), where we see the following local_to_auth rule:
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[1:$1@$0](.*@\QBD.COMPANY.PRI\E$)s/@\QBD.COMPANY.PRI\E$//
RULE:[2:$1@$0](.*@\QBD.COMPANY.PRI\E$)s/@\QBD.COMPANY.PRI\E$//
RULE:[1:$1@$0](.*@\QCOMPANY.PRI\E$)s/@\QCOMPANY.PRI\E$//
RULE:[2:$1@$0](.*@\QCOMPANY.PRI\E$)s/@\QCOMPANY.PRI\E$//
DEFAULT
</value>
If we try to check rule negotiation manually as the livyblauser user (same user that runs livy) from the same edge node, we see the following which looks functional and correct.
[livyblauser@nydc-pblalivy01 hadoop]$ hadoop org.apache.hadoop.security.HadoopKerberosName mzeoli@COMPANY.PRI
Name: mzeoli@COMPANY.PRI to mzeoli
Trying to understand where exactly the failure is when the same thing tries to happen in Livy. Unsure if the rules aren't adequate for what we're trying to do, or if it's simply not recognizing the rules / configs for some reason. DEBUG logs haven't been particular insightful, though we may have missed something in the haystack.
It's worth noting that we're in a multi-realm environment here. The users in question are part of COMPANY.PRI. the service principal and cluster itself is in BD.COMPANY.PRI. The rules seem to consider all of that fully (and our cluster itself has no problems over the last 2 yrs evaluating users that exist in COMPANY.PRI), so we're a bit confused and stuck. Any insight would be tremendously appreciated.
Thanks, Mike
ENV exports for livyblauser
export SPARK_HOME=/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p3380.937520/lib/spark
export SPARK_CONF_DIR=$SPARK_HOME/conf
export JAVA_HOME=/usr/java/latest/
export HADOOP_HOME=/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p3380.937520/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
livy.conf (all other items are remmed / default)
livy.impersonation.enabled = false
# for the execution of Livy itself
livy.server.launch.kerberos.principal = livyblauser@COMPANY.PRI
livy.server.launch.kerberos.keytab = /tmp/livyblauser.keytab
# Livy has a built-in SPnego authentication support for HTTP requests with below configurations.
livy.server.auth.type = kerberos
livy.server.auth.kerberos.principal = HTTP/nydc-pblalivy01.bd.company.com@BD.COMPANY.PRI
livy.server.auth.kerberos.keytab = /tmp/livyblauser.keytab
# Use this keystore for the SSL certificate and key.
livy.keystore = /opt/cloudera/security/jks/server.jks
# Specify the keystore password.
livy.keystore.password = xxxx
# Specify the key password.
livy.key-password = xxxx
... View more
Labels:
11-15-2018
10:49 AM
Thanks Tim!. Leaving max memory set while I was testing this was my fatal flaw. Definitely agree on the multiple concurrent rogue queries creating problems. We're actually working on a bot to periodically poll and purge queries that appear to be going off the rails. Looking forward to upgrading to 6.x for additional bulwarks inside CM. Again, thanks!!
... View more
11-15-2018
07:48 AM
Actually, in re-reading what I wrote, I'm further confused... When resource management is enabled, the mechanism for this option changes. If set, it overrides the automatic memory estimate from Impala. Impala requests this amount of memory from YARN We are not using Impala on YARN (ie, Llama). I don't think it's physical possible to use this feature anymore (we're running CDH 5.11). So how can setting the default Impala mem_limit be causing a reservation of memory versus a simple over-the-limit check during execution? Thanks again, M
... View more
11-15-2018
07:29 AM
Hello. We currently have a number of classes of users leveraging Impala. We're running into issues where some users create queries that exhaust availabe Impala memory and then impact other queries, potentially taking them down as we encounter OOM errors. Our goal is to leverage a default memory limit per query (potentially across pools) to prevent rogue queries from running unchecked. However, when we set this value to some fraction of total Impala memory per node (say, 128GB) we end up with Impala attempting to RESERVE that amount of memory for each query, which is not the effect we're looking for. Since we're using resource management (not Llama, but Admission Control), I believe the following is relevant: When resource management is enabled, the mechanism for this option changes. If set, it overrides the automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the query does not proceed until that much memory is available. The actual memory used by the query could be lower, since some queries use much less memory than others. With resource management, the MEM_LIMIT setting acts both as a hard limit on the amount of memory a query can use on any node (enforced by YARN) and a guarantee that that much memory will be available on each node while the query is being executed. I guess my question is, can we use resource management for Impala AND have mem_limit actual be a simple limit-per-query and NOT a reservation-per-query? Do we absolutely need to turn of resource management for Impala if we want mem_limit to behave as a limit and not a reservation? If so, what exactly do we need to turn off? Since Llama is no longer relevant in the Impala universe, the actual setting I'm supposed to toggle is a little obscure. Is it actually Admission Control we need to turn off, or something else? Thanks! Mike
... View more
Labels:
- Labels:
-
Apache Impala