Member since
10-04-2016
243
Posts
281
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1184 | 01-16-2018 03:38 PM | |
6155 | 11-13-2017 05:45 PM | |
3060 | 11-13-2017 12:30 AM | |
1524 | 10-27-2017 03:58 AM | |
28464 | 10-19-2017 03:17 AM |
10-15-2017
09:08 PM
3 Kudos
Two YARN Queues have been setup - default, llap Whichever queue is marked as Hive LLAP Queue (Interactive Query Queue), that queue does not process any queries/job The following are the steps to reproduce the issye:
There are two YARN Queues setup. In Ambari, default queue is selected for LLAP from the Interactive Query Queue drop down (Ambari > Hive > Configs > Interactive Query Queue dropdown). Any job/query submitted to default queue does not run. However, all queries submitted to LLAP queue runs successfully. Using the same process listed above, change the Interactive Query Queue to LLAP. Any query/job submitted to LLAP queue does not run. However, all queries submitted to default queue runs successfully. Basically, at any time, the LLAP queue does not run any job/query. Root Cause: This issue occurs when there is an issue with queue prioritization, where both queues have priority set to 0 which means both are of equal priority. This can be viewed from Ambari > Yarn Queue Manager view. The Hive LLAP (Low-Latency Analytical Processing) enables us to run Hive queries with low-latency in near real-time. To ensure low-latency, set the priority of the queue used for LLAP to a higher priority, especially if the cluster includes long-running applications. Solution To resolve this issue, set the priority of queue LLAP to a higher value higher than the default queue. After setting the higher priority, ensure to save and refresh the queues for the change to take effect. For YARN Queue Priorities to be applied, enable preemption. To enable preemption, refer to this documentation.
... View more
Labels:
10-15-2017
08:43 PM
4 Kudos
@Turing nix This appears to be working as designed. In your example, YARN ACL allows 'user02' to submit jobs to engineering01 queue. Now you have 2 scenarios: 1. When doAs=true in Hive : the job submitted to the queue will run as enduser. Since user02 submits the job to this queue and YARN ACL allows user02 to do this, the job is accepted. 2. When doAs=false in Hive : the job submitted to the queue will run as user 'hive'. Since user02 submit the job as 'hive' user to the queue and YARN ACL only allows user02, it correctly fails this time. Update: Whether doAs=true or doAs=false, you can still audit using Ranger. You can look at the best practices for implementing this with Ranger. https://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/ The article is for an old version of HDP but the concept is still valid.
... View more
10-13-2017
06:47 PM
3 Kudos
@Viswa countByValue() converts result in a Map collection not a RDD. saveAsTextFile() is defined to work on a RDD, not on a map/collection. Even though you have named the variable as RDD2 as shown below, it does not result in a 'RDD' RDD2 = RDD1.countByValue() Here are the definitions: def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long]
Return the count of each unique value in this RDD as a local map of (value, count) pairs.
def saveAsTextFile(path: String): Unit
Save this RDD as a text file, using string representations of elements.
... View more
10-13-2017
04:28 PM
3 Kudos
Just posted this KB article few minutes ago. This should help. https://community.hortonworks.com/articles/141573/installing-spark-thrift-server-in-a-kerberos-secur.html
... View more
10-13-2017
03:56 PM
3 Kudos
Scenario 1: Only one instance of Spark Thrift Server is needed Approach: If you are installing the Spark Thrift Server on a Kerberos-secured cluster, the following instructions apply:
The Spark Thrift Server must run in the same host as HiveServer2 , so that it can access the hiveserver2 keytab. Edit permissions in /var/run/spark and /var/log/spark to specify read/write permissions to the Hive service account. /var/run/spark and /var/log/spark should be able read/write to hive. So, just seeing contents as user hive is not enough, you need to be able to write to those folders. One way is to give 77x permissions on these folders. Since spark:hadoop is owner:group and hive belongs to group hadoop, it will have write access with this setup. Use the Hive service account to start the thriftserver process. It is recommend that you run the Spark Thrift Server as user hive instead of user spark . This ensures that the Spark Thrift Server can access Hive keytabs, the Hive metastore, and data in HDFS that is stored under user hive . When the Spark Thrift Server runs queries as user hive , all data accessible to user hive will be accessible to the user submitting the query. For a more secure configuration, use a different service account for the Spark Thrift Server. Provide appropriate access to the Hive keytabs and the Hive metastore. If you still do not want to install the STS on the same host as HiveServer2 for some reason, then you must follow the below approach. Scenario 2 : Install multiple Spark Thrift Server instances on hosts other than HiveServer2 Approach : Run all commands as the root user.
Back up hive.service.keytab in /etc/security/keytabs on Hive Server host by making a copy of the file and move the copy to a different directory than /etc/security/keytabs. If Spark Thrift Server host also has hive.service.keytab in/etc/security/keytabs, make a copy of the file and move the copy to a different directory than /etc/security/keytabs. On the Ambari Server node, run the following command from the command line to obtain and cache Kerberos ticket-granting tickets.
kinit [admin principal] Type in the admin principal password when asked. The admin principal name and the admin principal password are the ones used to enable Kerberos via Ambari. For example: If the admin principal used to enable Kerberos was root/admin and corresponding password was abc123, run kinit root/adminand type abc123 when prompted for password by the command line. On the Ambari Server node, in a temporary directory, run the following command to open kadmin shell
kadmin
Add a new principal as hive/[spark_thrift_server_host]@[Kerberos realm]. Replace [spark_thrift_server_host] with the host name of the Spark Thrift Server on the cluster. Replace [Kerberos realm] with the Kerberos realm used when enabling Kerberos in Ambari. For example, if Kerberos is enabled in Ambari with Kerberos realm MyDomain.COM, use it to replace [Kerberos realm].
addprinc -randkey hive/[spark_thrift_serverhost]@[Kerberos realm]
Add all Hive principals to the Hive service keytab file. This should include the existing one for the Hive Server host and the one created in the previous step. ktadd -k hive.service.keytab
hive/[spark_thrift_server_host]@[Kerberos realm]
ktadd -k hive.service.keytab
hive/[hive_server_host]@[Kerberos realm] Replace [spark_thrift_server_host], [hive_server_host]and [Kerberos realm] with information specifically for the cluster. kadmin shell should print out messages indicating the principal is added to the file. For example: kadmin: ktadd -k hive.service.keytab
hive/myserver1.mydomain.com@MyDomain.COM
Entry for principal hive/ myserver1.mydomain.com@MyDomain.COM with kvno 3, encryption type aes256-cts-hmac-sha1-96 added to keytab
WRFILE:hive.service.keytab.Entry for principal hive/ myserver1.mydomain.com@MyDomain.COM with kvno 3, encryption type aes128-cts-hmac-sha1-96 added to keytab WRFILE:hive.service.keytab. Type exit to exit the kadmin shell. Find the newly generated hive.service.keytab in the current directory location.
Add it to /etc/security/keytabs on Spark Thrift Server host. Use it to replace /etc/security/keytabs Hive Server host.
Update permission and ownership of the file on both Spark Thrift Server host and Hive Server host as shown below. chmod 400 hive.service.keytab
chown
[hive_user]:[hive_user_primary_group]hive.service.keytab Stop all Spark components via Ambari web UI. Ensure there are no running Spark processes on the Spark component hosts. Restart Hive from Ambari UI. Start Spark Service from Ambari UI.
... View more
Labels:
10-11-2017
12:52 AM
3 Kudos
Environment - HDP-2.6.2 with Hive and Atlas on the cluster. Current Config(based on HDP Doc 😞 hive.exec.pre.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook
hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook, org.apache.atlas.hive.hook.HiveHook
hive.exec.failure.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook Scenario - Often, simple hive query like 'show databases' is failing with the following error: 2017-10-10 14:00:38,985 INFO [HiveServer2-Background-Pool: Thread-273112]: log.PerfLogger (PerfLogger.java:PerfLogBegin(135)) - <PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver> 2017-10-10 14:00:38,986 ERROR [HiveServer2-Background-Pool: Thread-273112]: ql.Driver (SessionState.java:printError(962)) - FAILED: Hive Internal Error: java.util.concurrent.RejectedExecutionException(Task java.util.concurrent.FutureTask@3f389d45 rejected from java.util.concurrent.ThreadPoolExecutor@5b868755[Running, pool size = 1, active threads = 1, queued tasks = 10000, completed tasks = 14807]) java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@3f389d45 rejected from java.util.concurrent.ThreadPoolExecutor@5b868755[Running, pool size = 1, active threads = 1, queued tasks = 10000, completed tasks = 14807] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:174) I found HCC posts that mention to update the config as shown below: hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook This has solved the problem in the HCC post, however, I want to understand: 1. what is the significance of this values mentioned in the document ? 2. If removing the value solves the problem, then is the document incorrect ? 3. What impact does updating the value as per HCC post have on functioning of Atlas ?
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache Hive
10-11-2017
12:50 AM
@Jay SenSharma, @Shalini Goel - does this change have any impact on functioning of Atlas in the cluster ? As per HDP doc, we need to have the following: hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.ATSHook, org.apache.atlas.hive.hook.HiveHook
... View more
10-10-2017
05:47 PM
3 Kudos
Spark 1.6.3 does not support this. https://spark.apache.org/docs/1.6.3/sql-programming-guide.html#creating-dataframes
... View more
10-08-2017
02:11 AM
This saved me a good couple hours! Thanks!
... View more
10-08-2017
02:03 AM
1 Kudo
@Mingliang Liu Great article! I faced the following error when I tried to build using maven(step 7): Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.1.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version Here is how I resolved this by creating a symlink: ln -s /usr/local/Cellar/protobuf@2.5/2.5.0/bin/protoc /usr/local/bin
... View more