Member since
09-20-2016
38
Posts
9
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1398 | 08-08-2018 01:31 PM | |
1524 | 08-25-2017 06:13 AM | |
1394 | 09-28-2016 10:43 AM |
11-19-2018
01:33 PM
We are struggling
to get SparkR to work together with Hive LLAP in HDP 3.0. In the https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html
documentation, there is only talk about python and Java. Nothing about R. Is
there not support for R anymore? And if so, what other options do we have as we
have a lot of SparkR code accessing Hive today. The problem
I get is that the SparkR code tries to access the HDFS files for the database
and tables in Hive, as if LLAP connector is not there at all. So basically, we
get the following Caused
by: org.apache.spark.sql.AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table
addresstype. java.security.AccessControlException: Permission denied: user=<username>,
access=EXECUTE, inode="/apps/hive":hdfs:hdfs:drwx------ And that is
correct. The user I’m running with don’t have, and shouldn’t have permissions
to that folder. Is anybody using sparkr together with Hive LLAP in HDP3.0?
... View more
Labels:
08-22-2018
06:06 AM
There are
no direct errors. I just can’t get it to even try to use LLAP. So it tries to
read the orc files directly from HDFS, and the error it gives me are saying
that I don’t have HDFS permissions to access those files. And that is correct.
I don’t and I shouldn’t have. It’s not needed if the LLAP integration is
working (as it is with Python and Java/Scala). What
worries me is that in the documentation it says that it only supports Python
and Java/Scala. Not a word about R.
... View more
08-21-2018
08:25 AM
I’m
upgrading one of our clusters right now to HDP 3.0 and the upgrade itself
worked fine. After some struggles, I manage to get Spark to work with LLAP
in both Java/Scala and Python. But I can’t
find any good information about how to get R to work with spark and LLAP.
Before the upgrade, R worked fine with spark and LLAP, and we got some code
running in production right now that is using that. So we really need it to
work in HDP 3.0 as well. According to the documentation on
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html
there isn’t even support for R. Am I missing something here or is R not
supported anymore? (that would kind of ruin the day for me)
Before the
upgrade, the following code worked without any problems.
Sys.setenv(HADOOP_CONF_DIR = "/etc/hadoop/conf")
Sys.setenv(HIVE_CONF_DIR = "/etc/hive/conf")
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark2-client/")
Sys.setenv("SPARKR_SUBMIT_ARGS"="--master yarn --deploy-mode client --executor-memory 2688M --jars /usr/hdp/2.6.5.0-292/spark_llap/spark-llap-assembly-1.0.0.2.6.5.0-292.jar --driver-class-path /usr/hdp/2.6.5.0-292/spark_llap/spark-llap-assembly-1.0.0.2.6.5.0-292.jar --conf spark.executor.extraClassPath=/usr/hdp/2.6.5.0-292/spark_llap/spark-llap-assembly-1.0.0.2.6.5.0-292.jar --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true sparkr-shell")
library("DBI")
library SparkR, lib.loc = c(file.path(paste(Sys.getenv("SPARK_HOME"), "R", "lib", sep="/"))))
sparkR.session(appName = "SparkR-Test")
head(sql("select * from testtable"))
... View more
Labels:
08-20-2018
09:03 AM
Setting spark.security.credentials.hiveserver2.enabled to false solved the problem. I can now use spark with LLAP in both Java and Python. Just R missing now. Will try to find out how to do it there aswell. Thanks for the help!
... View more
08-17-2018
12:02 PM
Thanks for the answer. But I have verified those setting atleast ten times now, and they are correct as far as I can see. This cluster worked with Spark + LLAP (even in Livy) with HDP 2.6.5, and most of these settings are the same.
... View more
08-17-2018
08:51 AM
I’m
upgrading one of our clusters right now to HDP 3.0 and the upgrade itself
worked fine. But after the upgrade, I just can’t get Spark with LLAP to work. This
is not a new feature for us, as we have been using this for as long as the
support have been there. As there is
some changes in the configuration, I’ve followed and change the config
according to both
https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html
and
https://github.com/hortonworks-spark/spark-llap/tree/master The
testcode I’m running is the following spark-shell
--master yarn --deploy-mode client --jars
/usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar import com.hortonworks.hwc.HiveWarehouseSession
import com.hortonworks.hwc.HiveWarehouseSession._
val hive = HiveWarehouseSession.session(spark).build()
hive.showDatabases().show(100) The error I
get is the following. java.lang.RuntimeException:
java.sql.SQLException: Cannot create PoolableConnectionFactory (Could not open
client transport with JDBC Uri: jdbc:hive2://<server>:10501/;transportMode=http;httpPath=cliservice;auth=delegationToken:
Could not establish connection to jdbc:hive2:// <server>:10501/;transportMode=http;httpPath=cliservice;auth=delegationToken:
HTTP Response code: 401) The Hive
server show the following 2018-08-17T07:28:50,759
INFO [HiveServer2-HttpHandler-Pool:
Thread-175]: thrift.ThriftHttpServlet (ThriftHttpServlet.java:doPost(146)) -
Could not validate cookie sent, will try to generate a new cookie 2018-08-17T07:28:50,759
INFO [HiveServer2-HttpHandler-Pool:
Thread-175]: thrift.ThriftHttpServlet
(ThriftHttpServlet.java:doKerberosAuth(399)) - Failed to authenticate with
http/_HOST kerberos principal, trying with hive/_HOST kerberos principal 2018-08-17T07:28:50,760
ERROR [HiveServer2-HttpHandler-Pool: Thread-175]: thrift.ThriftHttpServlet
(ThriftHttpServlet.java:doKerberosAuth(407)) - Failed to authenticate with
hive/_HOST kerberos principal 2018-08-17T07:28:50,760
ERROR [HiveServer2-HttpHandler-Pool: Thread-175]: thrift.ThriftHttpServlet
(ThriftHttpServlet.java:doPost(210)) - Error: org.apache.hive.service.auth.HttpAuthenticationException:
java.lang.reflect.UndeclaredThrowableException at
org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:408)
~[hive-service-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at
org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:160)
[hive-service-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at
javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
[javax.servlet-api-3.1.0.jar:3.1.0] at
javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
[javax.servlet-api-3.1.0.jar:3.1.0] at
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.server.Server.handle(Server.java:534) [jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
[jetty-io-9.3.20.v20170531.jar:9.3.20.v20170531] at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
[jetty-io-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
[jetty-io-9.3.20.v20170531.jar:9.3.20.v20170531] at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
[jetty-runner-9.3.20.v20170531.jar:9.3.20.v20170531] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[?:1.8.0_112] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[?:1.8.0_112] at java.lang.Thread.run(Thread.java:745)
[?:1.8.0_112] Caused by:
java.lang.reflect.UndeclaredThrowableException at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1706)
~[hadoop-common-3.1.0.3.0.0.0-1634.jar:?] at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:405)
~[hive-service-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] ... 25 more Caused by:
org.apache.hive.service.auth.HttpAuthenticationException: Kerberos
authentication failed: at org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:464)
~[hive-service-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at
org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:413)
~[hive-service-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at
java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112] at
javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112] at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
~[hadoop-common-3.1.0.3.0.0.0-1634.jar:?] at
org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:405)
~[hive-service-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] ... 25 more Caused by:
org.ietf.jgss.GSSException: Defective token detected (Mechanism level:
GSSHeader did not find the right tag) at
sun.security.jgss.GSSHeader.<init>(GSSHeader.java:97) ~[?:1.8.0_112] at
sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:306)
~[?:1.8.0_112] at
sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
~[?:1.8.0_112] at
org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:452)
~[hive-service-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at
org.apache.hive.service.cli.thrift.ThriftHttpServlet$HttpKerberosServerAction.run(ThriftHttpServlet.java:413)
~[hive-service-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] at
java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_112] at
javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_112] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
~[hadoop-common-3.1.0.3.0.0.0-1634.jar:?] at
org.apache.hive.service.cli.thrift.ThriftHttpServlet.doKerberosAuth(ThriftHttpServlet.java:405)
~[hive-service-3.1.0.3.0.0.0-1634.jar:3.1.0.3.0.0.0-1634] ... 25 more I can see
that it complains about the Kerberos ticket, but I do have a valid key in my
session. Running any other Kerberos access like beeline works fine from the
same session. Does
anybody have any clue about this error?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
08-08-2018
01:31 PM
Problem
solved If you have
created a hbase or phoenix table through hive as an “internal” table, it will be created as a managed table
with the storage handler towards hbase/phoenix. This is what’s causing the
problem. Managed tables with the hbase/phoenix storagehandler won’t work in the
“Move Hive Tables” part of the upgrade (external tables works ofc). I had to
manually remove those tables from the Hive metadatabase and then the “Move Hive
Tables” part of the upgrade works fine.
... View more
08-08-2018
11:14 AM
Update: It looks
like it's related to one or more tables. I have around 50 databases in Hive. I
selected each and every one of them with the -d flag (regexp of databasename)
and I only get the 255 exit code on 4 of the databases. All the others are
working fine. I will now try
to pinpoint exactly what tables in those databases are causing the error and see
if I can find anything strange with them.
... View more
08-08-2018
10:05 AM
I'm using MySQL for the metastore. No errors in the log
... View more
08-08-2018
08:51 AM
I’m trying
to migrate to HDP 3.0, but the upgrade hangs on “Move Hive Table”. When I look
at the log, the actual move command just exit without any error messages and
with exit code 255. This happens even when I try to run the command manually.
So it’s kind of hard to understand what the real problem is as I get no output
at all from the command. Only time I get
something in return is when I add the -h and I get the help output. /usr/hdp/3.0.0.0-1634/hive/bin/hive
--config /etc/hive/conf --service
strictmanagedmigration --hiveconf hive.strict.managed.tables=true -m automatic --modifyManagedTables
--oldWarehouseRoot /apps/hive/warehouse Can anybody
help me to better understand why it’s exiting with code 255 and if possible how
to solve it.
... View more
- Tags:
- hdp-upgrade
- upgrade
06-04-2018
07:07 AM
Yes, Authorization is with Ranger Everything else is working fine in the cluster. Ranger with Spark + LLAP works fine. Zeppelin with R/Python + Spark + Livy + LLAP + Ranger is working fine. Only thing after the upgrade that is not working is the sparklyr problem we have. So I dont think the problem we have are related to Authorization. .
... View more
05-31-2018
11:51 AM
Hi
For a long
time, we have been running Hive LLAP together with sparklyr and it’s been
working fine. Today, we upgraded to HDP 2.6.5 and from now on, we can’t connect
to Hive through sparklyr. SparkR works fine with Hive LLAP. Easiest way to see
the problem is to just show all databases in Hive. After the upgrade, we only
see the “default” database. If I try to list the tables in it, it returns an
empty list. There is no errors, stacktraces or anything else that can point at
the problem.
The code we
are running that worked before the upgrade to HDP 2.6.5 is the following
Sys.setenv(HADOOP_CONF_DIR = "/etc/hadoop/conf")
Sys.setenv(HIVE_CONF_DIR = "/etc/hive/conf")
Sys.setenv(SPARK_HOME="/usr/hdp/current/spark2-client/")
library(argparse)
library(tidyverse)
library(sparklyr)
library(DBI)
library(plyr)
library(dplyr)
.config <- spark_config()
.config <- c(.config, list("spark.executor.memory"="2688M",
"spark.shuffle.service.enabled"="true",
"spark.dynamicAllocation.enabled"="true",
"spark.executor.extraClassPath"="/usr/hdp/2.6.5.0-292/spark_llap/spark-llap-assembly-1.0.0.2.6.5.0-292.jar"))
sc <- spark_connect(master = "yarn-client",
app_name = "sparklyr-test",
config = .config)
DBI::dbGetQuery(sc, 'show databases')
Anybody got
any information that can help us solve this problem?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
02-22-2018
09:14 AM
So’ I’m facing a problem that I don’t really know how to solve. The
core problem is that Atlas don’t process the information in the ATLAS_HOOK
topic fast enough. So we have a backlog that is growing every day. As we want to use tag-based security in combination with Ranger,
we moved away from dropping and recreating the tables in Hive every night when
we do our sqoop imports to instead do the sqoop import into a temporary table,
and then truncate and “insert into” the target table. We are doing this for
many 1000’s of tables every night, and many of them have over 1000 columns. This
in combination with the Column level lineage in Atlas creates a huge workload
that Atlas needs to process, and it just doesn’t handle it. What I’ve been trying to do is increase the HBase and Kafka
performance to make sure that there is no
bottlenecks tere. Like HBase atlas_titan table is right now running evenly distributed
over 287 partitions, and I can read all messages in the topic in roughly 10-15
minutes. So I don’t think that the problem is within those two systems. I like to get some pointer on what to do to increase the
performance on how fast Atlas is processing the data in the ATLAS_HOOK topic.
For example, it looks like the NotificationHookConsumer is only running with
one thread. Is it possible to run this in a multi-threaded setup to be able to
process the data in parallel? Anything else you can think of that can help me
here?
... View more
Labels:
- Labels:
-
Apache Atlas
02-08-2018
06:58 AM
Thanks for the suggestions,
but that actually made the processing a lot slower. I have the performance
monitoring enabled, so I can see that the ENTITY_FULL_UPDATE takes between 7 and 8 seconds each. With
the suggested parameter set, they where running for over 30 seconds each. After debugging what is happening during the
ingestion of data in Atlas, I seems to think that the calls to HBase is the once
that takes time. After a very short look in HBase, I see that the atlas_titan
table is running with just one region. After a check in the documentation, I
can see that the atlas.graph.storage.hbase.region-count
or the atlas.graph.storage.hbase.regions-per-server
would change the number of regions on the table during creation. So I tried
setting those, drop the HBase table and start everything again. But the HBase
table is still created with just one region. Anybody have any info on how to
create the initial atlas_titan HBase table with more than one region?
... View more
02-07-2018
06:17 AM
We are having a problem with lagging and
falling behind the messages in the ATLAS_HOOK Kafka topic. And I can understand
that, as we ingest a large number of tables every day to the cluster.
Basically, we are creating roughly 165000 entries in the ATLAS_HOOK topic every
day. Primarily from sqoop and create/drop tables in Hive. Problem is that Atlas
only process around 35-40000 entries per day, so it kind of builds up. Many of the tables we import are quite wide, so it’s pretty
common that the messages in the Kafka topic are between 600-800Kb each. I have verified that I can consume the messages in the topic
from a normal Kafka client, so it’s not a problem with Kafka.I have also
cleared the two HBase tables and cleared the Kafka topic just to start over
from the beginning., but the problem remains. I would like to get some help with what kind of performance
tuning I can do to make sure that Atlas can consume at least 200.000 entries
from the ATLAS_HOOK topic per day (we are planning to add a lot more
datasources over the next couple of month). What options do I have to make this
happen? The HDP version we are running is 2.6.3 //Berry
... View more
Labels:
- Labels:
-
Apache Atlas
01-31-2018
08:12 AM
Did you find any solution to this? I'm having the same problem. It used to work, but I think it went away after upgrading to HDP 2.6.3 (but no real proof of that tbh)
... View more
01-19-2018
01:10 PM
You can always create a new table based on a select that changes and removes the unwanted rows/data, and then rename the tables.
... View more
08-25-2017
06:13 AM
4 Kudos
So, the main reason we see this error is because the default
behaviour for ssl cert verification have changed in python. If you take a look
in /etc/python/cert-verification.cfg, you will see that in python-libs-2.7.5-34,
the “verify=disable” value was default. But after upgrade of that package to python-libs-2.7.5-58,
the value is now “verify=platform_default”. And at least in our system, that
means enabled. After changing this back to “verify=disable”, the synchronization
works again without having to do the workaround I wrote about earlier. I have
verified this on a non-upgraded system by changing it to enabled and that also
results in errors for the user synchronization This error also affect LLAP if you are running that. After upgrade,
LLAP wont start because of cert verifications. You will get a “[SSL:
CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)” error
message. Changing the verify parameter described above also fixes that problem.
... View more
08-24-2017
10:30 AM
1 Kudo
@Aaron Norton One way you can work around this problem is to change
"SERVER_API_HOST = '127.0.0.1'" in
/usr/lib/python2.6/site-packages/ambari_server/serverUtils.py so it points to
your server with the full hostname. That will work around the problem with SSL
that we see.
... View more
08-23-2017
12:54 PM
I just got the exact same error. And the problem came after
a “yum update” on a Redhat 7 server. I tested the synchronization just before I
upgraded the OS and it worked fine. After the upgrade, I get the same error. I’ll
post an answer once I find a solution to the problem
... View more
08-23-2017
07:16 AM
1 Kudo
After following this quide, I still got Kerberos errors when trying to communicate with LLAP. Turns out that you also need to set livy.spark.yarn.security.credentials.hiveserver2.enabled=true in the Livy interpreter in Zeppelin to make it work.
... View more
08-23-2017
06:32 AM
1 Kudo
I have found the solution to my problem. Add the following to the Livy interpreter in Zeppelin. Once that is done, I can use Spark SQL with LLAP and Livy in Zeppelin. The jobs are executed as the logged in Zeppelin users (Shiro authentication to Microsoft AD) and that is verified in Hive2's history log. livy.spark.yarn.security.credentials.hiveserver2.enabled true
... View more
08-23-2017
06:03 AM
Hi all So, I have a problem getting Spark to work with LLAP in
Zeppelin with the help of Livy. The version I’m using is HDP 2.6.1, but I had
the same problem with 2.6.0. I’ve looked and followed the two guides in https://community.hortonworks.com/articles/110093/using-rowcolumn-level-security-of-spark-with-zeppe.html
and https://community.hortonworks.com/content/kbentry/101181/rowcolumn-level-security-in-sql-for-apache-spark-2.html,
but still no luck. Spark (no SQL/LLAP) with Livy works fine in Zeppelin
together with Kerberos. I can submit jobs in Zeppelin, and they gets executed
in Yarn correctly, with the right user and everything. So I know that the Kerberos
configuration is correct and working. But as soon as I try to run a Spark SQL
code, I get a Kerberos error saying that I don’t have a valid Kerberos ticket. (Caused by: GSSException: No valid credentials
provided (Mechanism level: Failed to find any Kerberos tgt)) Normal Spark without Livy works fine with LLAP. No problem
there at all. Configuration that I’m running with right now. Spark is configured with the following settings spark.hadoop.hive.llap.daemon.service.hosts @llap0
spark.hadoop.hive.zookeeper.quorum <3xZK servers
+ ports>
spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://<server>:10500/
spark.sql.hive.hiveserver2.jdbc.url.principal <HIVE principle> spark.sql.hive.llap true Livy server is configured with the following settings livy.impersonation.enabled true
livy.repl.enableHiveContext true
livy.server.access_control.enabled true
livy.server.access_control.users livy,zeppelin
livy.server.auth.kerberos.keytab <SPNEGO
keytab>
livy.server.auth.kerberos.principal <SPNEGO
principle>
livy.server.auth.type Kerberos
livy.server.launch.kerberos.keytab <LIVY
keytab>
livy.server.launch.kerberos.principal <LIVY
principle>
livy.superusers livy,zeppelin Zeppelin interpreter is
configured with the following settings livy.spark.hadoop.hive.llap.daemon.service.hosts @llap0
livy.spark.jars /lib/spark-llap_2.11-1.1.2-2.1.jar
(HDFS file)
livy.spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://<server>:10500/
livy.spark.sql.hive.hiveserver2.jdbc.url.principal <HIVE principle>
livy.spark.sql.hive.llap true
livy.superusers livy,zeppelin zeppelin.livy.keytab <LIVY
keytab>
zeppelin.livy.principal <LIVY principle> zeppelin.livy.url http://<server>:8999 Is there any other configuration, except the once that are
already described in the articles above, that I need to make to be able to get
a valid Kerberos ticket from the Spark session that Livy creates for me? Or do
you have any more information that can help me to get this functionality to
work? Best Regards Berry Österlund
... View more
Labels:
- Labels:
-
Apache Zeppelin
06-12-2017
07:25 AM
1 Kudo
Yes. When I make the REST call with curl, I include the Kerberos credentials and it work fine then. To be able to go forward with the rest of the tests, I made a really REALLY ugly solution (not in production, just test environment). I added /druid to druid.hadoop.security.spnego.excludedPaths so the query works from Hive without hitting the unsupported Kerberos problem. Nothing I can recommend to do, but it gives us the possibility test the rest of the solution until the Kerberos problem is fixed.
... View more
06-09-2017
01:04 PM
Hello I’m trying to get LLAP to work with the Spark2 Thrift server running in HDP-2.6. I have tried to follow the guide on https://community.hortonworks.com/content/kbentry/72454/apache-spark-fine-grain-security-with-llap-test-dr.html but have found a number of problems. According to that guide, I should download a spark-llap-assembly jar file from repo.hortonworks.com. The guide is written from HDP-2.5.3, and the jar file is there for that version of HDP. But it’s not there for HDP-2.6 for some strange reason. Anyway, I downloaded the spark-llap_2-11-1.0.2-2.1-assembly.jar instead and the Thrift server starts up with LLAP support. Using beeline, I can connect to the Thrift server and it looks fine until I try to run a query. As soon as I do that, I get the java.lang.NoSuckMethodError as you see below. Anybody know a solution for this? 17/06/09 12:34:51 ERROR SparkExecuteStatementOperation: Error executing query, currentState RUNNING,
java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.catalog.CatalogTable.copy(Lorg/apache/spark/sql/catalyst/TableIdentifier;Lorg/apache/spark/sql/catalyst/catalog/CatalogTableType;Lorg/apache/spark/sql/catalyst/catalog/CatalogStorageFormat;Lorg/apache/spark/sql/types/StructType;Lscala/Option;Lscala/collection/Seq;Lscala/Option;Ljava/lang/String;JJLscala/collection/immutable/Map;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/collection/Seq;Z)Lorg/apache/spark/sql/catalyst/catalog/CatalogTable;
at org.apache.spark.sql.hive.llap.LlapExternalCatalog$anonfun$getTable$1.apply(LlapExternalCatalog.scala:160)
at org.apache.spark.sql.hive.llap.LlapExternalCatalog$anonfun$getTable$1.apply(LlapExternalCatalog.scala:158)
at org.apache.spark.sql.hive.llap.LlapExternalCatalog.withClient(LlapExternalCatalog.scala:78)
at org.apache.spark.sql.hive.llap.LlapExternalCatalog.getTable(LlapExternalCatalog.scala:158)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:290)
at org.apache.spark.sql.hive.llap.LlapSessionCatalog.getTableMetadata(LlapSessionCatalog.scala:90)
at org.apache.spark.sql.execution.command.DescribeTableCommand.run(tables.scala:437)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$execute(SparkExecuteStatementOperation.scala:231)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$anon$1$anon$2.run(SparkExecuteStatementOperation.scala:174)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$anon$1$anon$2.run(SparkExecuteStatementOperation.scala:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$anon$1.run(SparkExecuteStatementOperation.scala:184)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
05-24-2017
06:52 AM
Issue reported on https://github.com/druid-io/druid/issues/4322
... View more
05-24-2017
06:29 AM
An update What Hive is doing is calling the /druid/v2/datasources/<DATASOURCE_NAME>/candidates?intervals=1900-01-01T00:00:00.000+01:00/3000-01-01T00:00:00.000+01:00
interface to get a list of candidates. According to the broker documentation (http://druid.io/docs/latest/design/broker.html),
the intervals should support ISO8601 standard for timestamps and that includes
the “+00:01” at the end for timezone. When I call the Broker directly with a simple curl command,
I get the following response back from the broker. <html> <head> <meta http-equiv="Content-Type"
content="text/html;charset=ISO-8859-1"/> <title>Error 500 </title> </head> <body> <h2>HTTP ERROR: 500</h2> <p>Problem accessing /druid/v2/datasources/<DATASOURCE_NAME>/candidates.
Reason: <pre>
java.lang.IllegalArgumentException: Invalid format:
"1900-01-01T00:00:00.000 01:00" is malformed at "
01:00"</pre></p> <hr /><i><small>Powered by
Jetty://</small></i> </body> </html> If I remove the timezone in the request (/druid/v2/datasources/<DATASOURCE_NAME>/candidates?intervals=1900-01-01T00:00:00.000/3000-01-01T00:00:00.000), the candidates returns correctly.
... View more
05-23-2017
10:24 AM
I’m struggling with getting Hive to work with Druid. So far,
I’ve got a working connection between the two and I can create datasources in
Druid from Hive, and I can also (with some workaround) query the data from
Hive. But not using the “avg” aggregation on one of the measures. If I do an
explain on the query in hive, I get the following Json (after making it a bit
easier to read) {
"queryType":"select",
"dataSource":"druid_sqoop_import_02",
"descending":false,
"intervals":[
"1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"
],
"filter":{
"type":"selector",
"dimension":"target_database",
"value":"ABC"
},
"dimensions":[
"import_type",
"target_database",
"target_tblname",
"source_database",
"source_tblschema",
"source_tblname"
],
"metrics":[
"sqoop_duration",
"merge_duration",
"concatenate_duration",
"sqoop_mappers",
"sqoop_rows",
"sqoop_size"
],
"granularity":"all",
"pagingSpec":{
"threshold":16384,
"fromNext":true
},
"context":{
"druid.query.fetch":false
}
} If I run a normal Json REST call to druid, this query
returns the values correctly. But from Hive, I get the following error. java.io.IOException:
org.apache.hive.druid.com.fasterxml.jackson.core.JsonParseException: Unexpected
character ('<' (code 60)): expected a valid value (number, String, array,
object, 'true', 'false' or 'null') If I look at the actual error message returned from the
Druid Broker, it says the following. <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 </title>
</head>
<body>
<h2>HTTP ERROR: 500</h2>
<p>Problem accessing
/druid/v2/datasources/druid_sqoop_import_02/candidates. Reason:
<pre>
java.lang.IllegalArgumentException: Invalid format:
"1900-01-01T00:00:00.000 01:00" is malformed at "
01:00"</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
</body>
</html> So for some reason, it’s trying to add what I guess is the
timezone as I’m living in UTC+1 to the end of the timestamp. I have also tried
to do a where on __time in Hive, but it still adds the “01:00” at the end. Have
anybody else had these problems, or can someone give me some pointers on how to
solve this one?
... View more
Labels:
- Labels:
-
Apache Hive
-
Druid
05-23-2017
05:44 AM
An update! I did a network package debug to see what druid is returning
to Hive, and I actually get a authentication error (see output further down).
Most likely this is due to Kerberos configured in the cluster. Strange that I
can create the datasource from Hive without getting this error. Now I’m hoping
that it’s just a configuration issue about setting the correct Hive settings to
be able to connect to a Kerberos enabled Druid installation. If anybody got a
hint, please let me know <html> <head> <meta http-equiv="Content-Type"
content="text/html;charset=ISO-8859-1"/> <title>Error 401 </title> </head> <body> <h2>HTTP
ERROR: 401</h2> <p>Problem
accessing /druid/v2/. Reason: <pre>
Authentication required</pre></p> <hr /><i><small>Powered by
Jetty://</small></i> </body> </html>
... View more
05-23-2017
04:30 AM
Hello I’m having Hortonworks HDP 2.6 installed together with the druid version that ships with it. I have installed Druid on a Kerberos secured cluster and are having problem accessing Druid from Hive. I can create a Druid datasource from hive using a normal “create table” statement, but I cant do a select on it. Once the datasource is created in Druid, I can do a normal json rest query directly to druid and I get the expected result back. So it feels like the Druid part is working as it should. When I query the data from Hive, I get two different output saying the same thing. Error: java.io.IOException: org.apache.hive.druid.com.fasterxml.jackson.core.JsonParseException: Invalid type marker byte 0x3c for expected value token at [Source: org.apache.hive.druid.com.metamx.http.client.io.AppendableByteArrayInputStream@245e6e5b; line: -1, column: 1] (state=,code=0) or Error: java.io.IOException: java.io.IOException: org.apache.hive.druid.com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') at [Source: org.apache.hive.druid.com.metamx.http.client.io.AppendableByteArrayInputStream@6dd4a729; line: 1, column: 2] at org.apache.hive.druid.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1576) at org.apache.hive.druid.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:533) at org.apache.hive.druid.com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:462) at org.apache.hive.druid.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2610) at org.apache.hive.druid.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:841) at org.apache.hive.druid.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:737) at org.apache.hive.druid.com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3090) at org.apache.hive.druid.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3036) at org.apache.hive.druid.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2199) at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.distributeSelectQuery(DruidQueryBasedInputFormat.java:227) at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getInputSplits(DruidQueryBasedInputFormat.java:160) at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getSplits(DruidQueryBasedInputFormat.java:104) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1932) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:311) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:856) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:552) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:715) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TServlet.doPost(TServlet.java:83) at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:206) at javax.servlet.http.HttpServlet.service(HttpServlet.java:755) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:479) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:349) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:449) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:925) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:76) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:609) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) (state=,code=0) Does anybody have any idea about whats wrong here?
... View more
Labels:
- Labels:
-
Apache Hive
-
Druid