Member since
06-21-2017
44
Posts
1
Kudos Received
0
Solutions
01-06-2021
03:24 AM
I'm trying to monitor the job status of a running Oozie job. However, running the following code from a general purpose user ("hue", which is also starting these jobs through runAs): curl --negotiate -u : http://FQDN:8088/proxy/application_1609929757167_0004/ws/v1/mapreduce/jobs/job_1609929757167_0004/tasks I'm getting an error, which does not really say anything useful: {"RemoteException":{"exception":"WebApplicationException","javaClassName":"javax.ws.rs.WebApplicationException"}} On the other hand, I'm able to see the correct response through a yarn user, suggesting some issues with the ACL's. My question: what ACL's and where should I give to the general purpose user so that it could see all the information of the running task? I've added the user to ``yarn.admin.acl``, as well as mapreduce.job.acl-modify-job, mapreduce.job.acl-view-job, mapreduce.jobhistory.admin.acl, but none of these help. What am I missing?
... View more
11-25-2020
05:52 AM
I'm running HDP 3.1 cluster with a TEZ-UI instance configured separately on Apache Tomcat. Configured everything following this guide. Everything seems to be working fine, however, when launching Hive LLAP queries I'm not seeing generated DAG IDs, and, for example, can't see the error messages after the query crashed. Example screenshot: What additional configuration is necessary for this to work? Sidenotes: Don't know whether this matters, but in the same manner the LLAP Master monitor (:10502 port) does not register any queries, even though Hive is configured to use LLAP and executors are always active and called (when inspecting nodes at :15002 port), e.g.: Meanwhile in :15002, LLAP workers are used actively:
... View more
10-20-2020
01:06 AM
I'm trying to upgrade the Kerberos encryption types for an existing HDP 2.6 cluster. The problem is that I want to use the same KDC servers with a single configuration for multiple realms, where one would be set up with a Centos 7.8 or Centos 8, which does not support DES-type encryptions. The HDP2.6 cluster does not seem to work with other than the DES encryptions. I'm trying the following krb5.conf: default_tkt_enctypes = aes256-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac-md5 des-cbc-crc des-cbc-md5 des-cbc-md4
default_tgs_enctypes = aes256-cts-hmac-sha1-96 des3-cbc-sha1 arcfour-hmac-md5 des-cbc-crc des-cbc-md5 des-cbc-md4 In my understanding, these should cover both the old and weak des3-cbc-sha1 types, and the later aes256 types for the newer system. However, with this configuration set and doing the Keytab Regeneration through Ambari, the HDFS services doesn't start due to a probably GSS issue (same errors as in https://community.cloudera.com/t5/Support-Questions/Cloudera-Kerberos-GSS-initiate-failed/m-p/78727). When inspecting the auto-generated keytab, only one entry is created with "des3-cbc-sha1" tab. While this should work (and it does allow for a kinit), something is not okay for the namenode and it still results in the GSS errors while starting the namenode. What could be the issue here? What is the correct setting for kerberos enctypes that works with HDP? -------- I can reform the question in the following manner: Why does HDFS Namenode works (on HDP 2.6) only with the following krb5.conf entries: default_tkt_enctypes = des3-cbc-sha1 des3-hmac-sha1 des3-cbc-sha1-kd
default_tgs_enctypes = des3-cbc-sha1 des3-hmac-sha1 des3-cbc-sha1-kd
permitted_enctypes = des3-cbc-sha1 des3-hmac-sha1 des3-cbc-sha1-kd Nothing else works if I try moving away from the DES encryption.
... View more
07-15-2020
06:52 AM
Thank you! This is a very helpful response! Regarding the 3x replication for the DFS, is this still relevant with the latest releases of hadoop? For example, starting with hadoop 3, they did seem to introduce Erasure Coding to deal with this, although I'm not yet sure whether this is used by default by HDP3 and similar packs/platforms. The other things for consideration that you mentioned are very useful, will keep this in mind. It seems like a good idea to start with the minimum viable setup and scale up from there for the HA. In my experience, the migration of services between the nodes will be simple enough through Ambari, so this should have us covered.
... View more
07-14-2020
09:42 AM
Thanks for the response! Could you elaborate a little on choosing to split into 3 VMs over 2 VM? Apart from allowing 3 replications, would this add additional performance gains? While this will be a "dev" cluster, both data loss and performance is important. Depending on the performance and possible bottlenecks, the goal is to expand to additional machines in the future. However, in the meantime, would want to have a clean and future-proof start with the existing machine/configuration.
... View more
07-14-2020
03:41 AM
Hi all, what is the current best practice for setting up a HDP 3.1 framework on a single machine of 20 cores and 512GB RAM? Do we still need to split up the machine to, e.g., 2 x VM, in order to ensure at least 2x dfs replication? Or would a single node cluster be more efficient? If I understand correctly, even if using a JBOD disk setup for the HDFS, with a single node it cluster it will be possible to use at most 1x replication through the ``dfs.replication`` setting? What are your recommendations at this point? @Shelton
... View more
08-06-2019
02:46 PM
Important comment: this is 100% related to masking of some columns in the used tables. However, both the imposed masking AND the ability to create new tables using the initial masked one is important. Any known workaround?
... View more
08-06-2019
02:32 PM
Running on HDP2.6. This query seemed to be working fine (a simple create table operation; just doing a select works without probems), but now Hive LLAP returns the following error when trying to create a table: Error running query: java.lang.AssertionError: Unexpected type UNEXPECTED from: 2019-08-06T17:29:20,701 WARN [HiveServer2-Handler-Pool: Thread-1858]: thrift.ThriftCLIService (ThriftCLIService.java:ExecuteStatement(516)) - Error executing statement:
org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.AssertionError: Unexpected type UNEXPECTED
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:225) ~[hive-service-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276) ~[hive-service-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:312) ~[hive-service-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:508) ~[hive-service-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:495) ~[hive-service-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:309) ~[hive-service-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:506) ~[hive-service-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:599) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: java.lang.AssertionError: Unexpected type UNEXPECTED
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.fixUpCtasAndInsertAfterCbo(CalcitePlanner.java:952) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:378) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11167) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:290) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:257) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:455) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1197) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1184) ~[hive-exec-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:191) ~[hive-service-2.1.0.2.6.1.0-129.jar:2.1.0.2.6.1.0-129]
... 15 more This is the first time I'm seeing such a error. What does it even link to? Any known solutions? EDIT: Important comment: this is 100% related to masking through Apache Ranger of some columns in the used tables. However, both the imposed masking AND the ability to create new tables using the initial masked one is important. Any known workaround?
... View more
07-15-2019
12:19 PM
Thanks for clarifying. It was indeed recreated after a full restart. Though, I'm noticing tons of errors, hope these are related to a slow start of all the services.
... View more
07-15-2019
11:57 AM
On a perfectly running HDP3.1 cluster, one time the ATSv2 Timeline Reader stopped responding. Following some guide I've tried refreshing the service by cleaning the configs and restarting it by yarn app -destroy ats-hbase After this, restarting the service doesn't help, and starting the service by yarn app -start ats-hbase doesn't work, since the config json is missing: ERROR client.ApiServiceClient: File does not exist: <>/user/yarn-ats/.yarn/services/ats-hbase/ats-hbase.json What is this .json and how do I get it running again? Where are some example templates of a working .json that I could use? I remember this file existing somewhere on a fresh install, but sadly reinstall of the cluster is not an option. Another thing, trying a fresh install on my laptop VM does not create such json file too -- I guess the reason is that the laptop is too small for enforcing hbase use? Didn't find anything in the official documentation, so would be thankful for any advice! @Geoffrey Shelton Okot
... View more
02-27-2019
01:41 PM
This works, great workaround.
... View more
02-27-2019
01:41 PM
This works, great workaround.
... View more
02-27-2019
09:53 AM
Hi all, After setting up a fresh kerberized HDP 3.1 cluster with Hive LLAP, Spark2 and Livy, we're having trouble connecting to Hive's database through Livy. Pyspark from shell works without the problem, but something breaks when using Livy. 1. Livy settings are Ambari default, with additionally specified jars and pyfiles for the HWC connector, spark.sql.hive.hiveserver2.jdbc.url and spark.security.credentials.hiveserver2.enabled true. These are enough for pyspark shell to work without problems. 2. Connection is made through the latest HWC connector described here, since apparantly this is the only one that works for Hive 3 and Spark2. problem: 1. When spark.master is set to yarn client mode (See for example the comment here), the connector appends a principal "hive/_HOST@DOMAIN" and the connection returns GSS error - failing to find any Kerberos tgt (although, the ticket is there and livy has access to the hiveserver2). 2. When spark.master is set to yarn cluster mode, ";auth=delegationToken" is appended to the connection, where the error follows that "PLAIN" connection is made, where a kerberized one is expected. Notes: tried various settings -- zookeeper jdbc links vs direct through port 10500, hive.doAs = true vs false, various principals, but nothing works. Note2: everything works fine when connecting both through beeline (to hive at 10500 port) and through pyspark shell. Note3: HWC Connection snippet (from examples): from pyspark_llap import HiveWarehouseSession
hive = HiveWarehouseSession.session(spark).build()
hive.showDatabases().show(100) Any ideas? Feel like some setting on Livy is missing, especially weird seeing that "failed to find any Kerberos tgt" - where is it looking for it and why doesn't it see the ticker from "kinit"? @Geoffrey Shelton Okot @Hyukjin Kwon @Eric Wohlstadter
... View more
01-24-2019
11:34 AM
Hello all, I have set up a HDP 3 cluster with Hive running on LLAP-Interactive. However, when connecting through Hiveserver2 port (10500 default) (or if through zookeeper, then through Hiveserver2-Interactve), any query that is being run is going through YARN/TEZ containers. That is, if I check at the LLAP Dashboard (http://datanode:10502), I can see that there are 0 active/passive sessions. With my experience from HDP2.6, there should be every query listed. What can be the problem? LLAP is enabled with the default settings from Ambari 2.7, with the following settings: hive.execution.mode = llap #at hive-interactive-site
hive.llap.execution.mode = all #same with the default "only"
hive.execution.mode = container #at hive-site
Is this the default behaviour? Currently I'm worried that there are two different "hive.execution.mode" settings specified, so will try playing with them. But other than that, I'm not sure what further settings to check. @Geoffrey Shelton Okot
... View more
01-17-2019
09:20 AM
@Geoffrey Goldman Important question (should I post it as a new question? It does kind of follow up from your latest comment, so I post it here): so how should ideally the "default_tkt_enctypes", "default_tgs_enctypes" and "permitted_enctypes" should look like for a normal HDP cluster (not a test sandbox), which would work 100% of the times and also provide high level security? 1. When I've tried the default suggested settings of "des3-cbc-sha1 des3-hmac-sha1 des3-cbc-sha1-kd", I would get errors that the security level was too low. I've then further added "aes256-cts-hmac-sha1-96", but it seems more than one decent enctype is required for proper encryption? 2. The default Kerberos settings, suggested by Ambari, also suggests "des3-cbc-sha1 des3-hmac-sha1 des3-cbc-sha1-kd", but comments it out by default, so I guess it ends up using some default values, which doesn't seem stable (what if the default will change over time or new version of kerberos). 3. Now I've added all possible configs, "aes256-cts-hmac-sha1-96 aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal", but when using ``xst -k `` from ``kadmin`` service, it exports arounds 2-3 entries in the keytab with different encryptions, but not all 8+. Suggesting, that only some types are actually important.
... View more
01-17-2019
08:58 AM
1 Kudo
@Geoffrey Shelton Okot Thanks, I think I solved it. You know what was the problem? The Ambari wasn't creating/re-creating keytabs and principals for HTTP/_HOST@DOMAIN.COM - had to do that by hand. Plus, with the correct encryption... Thank you for your help! It's just interesting: did you have to create HTTP/_HOST principal, or did the Ambari create it automatically for you? If that's the case, I wonder why it didn't on my machine. By the way, I'm using openLDAP for Ldap/Kerberos database.
... View more
01-16-2019
03:40 PM
Hello all, after fresh kerberization of Ambari 2.7.3 / HDP 3 cluster, the HDFS namenode isn't able to start because the hdfs user can't talk to the webhdfs. The following error is returned: GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed) It is not only from ambari: I can recreate this error from a simple curl call from hdfs user: su - hdfs
curl --negotiate -u : http://datanode:50070/webhdfs/v1/tmp?op=GETFILESTATUS
Which returns </head>
<body><h2>HTTP ERROR 403</h2>
<p>Problem accessing /webhdfs/v1/tmp. Reason:
<pre> GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)</pre></p>
</body>
</html>
Overall permission for this user should be in tact, since I'm able to run hdfs operations from shell and kinit without problems. What could be the problem? I've tried recreating keytabs several times, and fiddling with ACL settings on the config, but nothing works. What principal is WEBHDFS expecting? The same results are when I'm trying accessing it with HTTP/host@EXAMPLE.COM principal. NB: I'll add that there's nothing fancy in the HDFS settings, mainly stock/default config. NB2: I will add, that I've added all possible encryption types to krb5.conf as I could find, but none if these helped: default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
permitted_enctypes = aes256-cts-hmac-sha1-96 aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
@Geoffrey Shelton Okot
... View more
01-16-2019
12:09 PM
This is to the point! I got the same error when tried changing the default Ambari suggested enctypes to my custom ones. The custom ones work fine though with MIT KDC, but apparantly not with Ambari.
... View more
01-15-2019
03:54 PM
Thank you, I had this exact issue with same errors and nothing in the comment discussion helped. However, after several ambari-server restarts and dumb retries of the "Kerberos wizard" with similar settings magically resolved this. I'm not sure at all what was the problem..
... View more
01-15-2019
02:02 PM
Thanks, I've noticed that too, after posting. While -S kadmin/admin worked, the -S kadmin/FQDN didn't. So reconfiguring this part on the KDC solved the problem. It's just interesting that I didn't bump into this on HDP 2.6 Ambari. About the future release of Ambari -- any ETA yet? 🙂
... View more
01-15-2019
12:21 PM
@Geoffrey Shelton Okot @huzaira bashir Did you manage to solve this yet? What was the problem?
... View more
01-15-2019
11:16 AM
Hello all, I'm trying to kerberize the Ambari 2.7.3 cluster. However, during the setup, I get the following error: Caused by: org.apache.ambari.server.serveraction.kerberos.KerberosOperationException: Unexpected error condition executing the kadmin command. STDERR: kadmin: Matching credential not found (filename: /tmp/ambari_krb_142308985016794830cc) while initializing kadmin interface
at org.apache.ambari.server.serveraction.kerberos.MITKerberosOperationHandler.invokeKAdmin(MITKerberosOperationHandler.java:323)
at org.apache.ambari.server.serveraction.kerberos.MITKerberosOperationHandler.principalExists(MITKerberosOperationHandler.java:123)
at org.apache.ambari.server.serveraction.kerberos.KerberosOperationHandler.testAdministratorCredentials(KerberosOperationHandler.java:314)
at org.apache.ambari.server.controller.KerberosHelperImpl.validateKDCCredentials(KerberosHelperImpl.java:2133) All of the authentication settings are okay, because I am able to kinit and use the kadmin interface from shell. It seems that the problem is that Ambari tries to do the following: kinit -p admin/admin@EXAMPLE.COM
kadmin -c /tmp/ambari_krb_... While it should be doing the following: kinit -S kadmin/admin@EXAMPLE.COM admin/admin@EXAMPLE.COM
kadmin -c /tmp/ambari_krb... I've tried replicating the two settings and confirmed my guess. The second code works from the shell. Further, If I intercept the temporarily generated credentials by ambari with my own, the code works. How can I fix this behaviour? This seem like a bug in Ambari code -- which part should I edit to fix this?
... View more
01-15-2019
11:14 AM
Hello all, I'm trying to kerberize the Ambari 2.7.3 cluster. However, during the setup, I get the following error: Caused by: org.apache.ambari.server.serveraction.kerberos.KerberosOperationException: Unexpected error condition executing the kadmin command. STDERR: kadmin: Matching credential not found (filename: /tmp/ambari_krb_142308985016794830cc) while initializing kadmin interface
at org.apache.ambari.server.serveraction.kerberos.MITKerberosOperationHandler.invokeKAdmin(MITKerberosOperationHandler.java:323)
at org.apache.ambari.server.serveraction.kerberos.MITKerberosOperationHandler.principalExists(MITKerberosOperationHandler.java:123)
at org.apache.ambari.server.serveraction.kerberos.KerberosOperationHandler.testAdministratorCredentials(KerberosOperationHandler.java:314)
at org.apache.ambari.server.controller.KerberosHelperImpl.validateKDCCredentials(KerberosHelperImpl.java:2133) All of the authentication settings are okay, because I am able to kinit and use the kadmin interface from shell. It seems that the problem is that Ambari tries to do the following: kinit -p admin/admin@EXAMPLE.COM
kadmin -c /tmp/ambari_krb_... While it should be doing the following: kinit -S kadmin/admin@EXAMPLE.COM admin/admin@EXAMPLE.COM
kadmin -c /tmp/ambari_krb... I've tried replicating the two settings and confirmed my guess. The second code works from the shell. Further, If I intercept the temporarily generated credentials by ambari with my own, the code works. How can I fix this behaviour? This seem like a bug in Ambari code -- which part should I edit to fix this?
... View more
10-09-2018
08:07 AM
We will be setting up a HDP 3 cluster, and the idea is to split 1xRAID1 and 2xHDD through 2 virtual machines. 1xRAID1 would handle the OS and local user files, while each of the 2xHDD's would be used for hdfs /grid/. My question: is 2xHDD enough to ensure file redundancy? I've read that Hadoop 3 does not require 3 replicas as it did in Hadoop 2, however, in case of a drive failure, would it still ensure that the files on the remaining HDD are enough for a full recovery? Or would you advice partitioning some of the 1xRAID1 for a third /grid/ drive for the hdfs? In this case it seems to introduce a lot of overhead due to RAID. What is the minimum amount of HDD's required for redundant hdfs operation?
... View more
10-09-2018
08:04 AM
We will be setting up a HDP 3 cluster, and the idea is to split 1xRAID1 and 2xHDD through 2 virtual machines. 1xRAID1 would handle the OS and local user files, while each of the 2xHDD's would be used for hdfs /grid/. My question: is 2xHDD enough to ensure file redundancy? I've read that Hadoop 3 does not require 3 replicas as it did in Hadoop 2, however, in case of a drive failure, would it still ensure that the files on the remaining HDD are enough for a full recovery? Or would you advice partitioning some of the 1xRAID1 for a third /grid/ drive for the hdfs? In this case it seems to introduce a lot of overhead due to RAID. What is the minimum amount of HDD's required for redundant hdfs operation? Big thanks!
... View more
10-01-2018
08:24 AM
Half of the resources on YARN in our cluster is dedicated to Hive LLAP, while the other half is left free for other various tasks, such as Spark or mapreduce containers. However, there are times when no other jobs are running, and LLAP is loaded to its fullest. Is there a setting allowing for dynamic allocation of all resources? That is, in such cases, would it be possible to allocate all, or i.e., 80% of the unused space for LLAP, and return it back to normal after finishing the jobs? As far as I know now, only way to do that would require a LLAP or YARN restart, which is not at all preferred.
... View more
08-28-2018
08:13 AM
In other words, this works:
create table config.test (jobid string, param string, value string)
partitioned by (dummy string)
clustered by (param) into 3 buckets
stored as orcfile
tblproperties('transactional'='true');
insert into config.test partition(dummy) values ('1', '2', '3', '1');
insert into config.test partition(dummy) values ('2', '3', '4', '1');
insert into config.test partition(dummy) values ('4', '4', '4', '1');
update config.test set value = '99' where jobid = '4' and param = '4' ;
But it does require a "dummy" field which does not serve any purpose. Are there any cleaner workarounds?
... View more
08-28-2018
07:59 AM
Consider this example table: create table config.test (param string, value string)
partitioned by (jobid string)
clustered by (param) into 3 buckets
stored as orcfile
tblproperties('transactional'='true'); I want this to be an ACID table, don't really need the partitioning and clustering, but creating this following all the guides. Assume I want to update some values: insert into config.test partition(jobid) values ('1', '2', '3');
insert into config.test partition(jobid) values ('2', '3', '4');
insert into config.test partition(jobid) values ('4', '4', '4');
update config.test set value = '99' where jobid = '4' and param = '4';
Inserts work fine, Update returns the following "java.lang.NegativeArraySizeException" error: Error
while processing statement: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Reducer 2, vertexId=vertex_1535365185102_0002_140_01,
diagnostics=[Task failed, taskId=task_1535365185102_0002_140_01_000000,
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task
( failure ) :
attempt_1535365185102_0002_140_01_000000_0:java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
... 15 more
Caused by: java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357)
... 21 more
], TaskAttempt 1 failed, info=[Error: Error while running task ( failure
) :
attempt_1535365185102_0002_140_01_000000_1:java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
... 15 more
Caused by: java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357)
... 21 more
], TaskAttempt 2 failed, info=[Error: Error while running task ( failure
) :
attempt_1535365185102_0002_140_01_000000_2:java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
... 15 more
Caused by: java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357)
... 21 more
], TaskAttempt 3 failed, info=[Error: Error while running task ( failure
) :
attempt_1535365185102_0002_140_01_000000_3:java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
at
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
at
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
... 15 more
Caused by: java.lang.NegativeArraySizeException
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357)
... 21 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1
killedTasks:0, Vertex vertex_1535365185102_0002_140_01 [Reducer 2]
killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to
VERTEX_FAILURE. failedVertices:1 killedVertices:0 What could be the problem? I'm working on a perfectly running cluster and we do have some ACID tables working. But I can't figure out why this simple example doesn't work? What works: The UPDATE works when "jobid = '4'" is NOT in the WHERE clause, but it updates too many rows. Why is this a problem? I'm updating a "param" field which is neither bucketed nor a partition field, those two fields are only i the WHERE clause. If this is really because the "jobid" is a partitioning field, I could create dummy/mirror fields that would have the same info but not be partition keys, but that does seem redundant. Any Ideas?
... View more
08-27-2018
07:05 PM
Oh, that's good to hear, looking forward to upgrade then! Thank you for the heads up!
... View more
08-27-2018
12:09 PM
When running a large Hive query that takes up the whole cluster (all LLAP executors are busy), simultaneously trying to run even a simplest query takes a lot of time, probably because the process is waiting for a free executor to work with. Is it possible to restrict/manage the queues in a similar manner as does YARN, i.e., so that there would always be at least one free executor for a quick query, or that the tasks from concurrent queries would be pushed in front of the line to guarantee some minimal amount of exposure to the executors these tasks would get?
... View more