Member since
12-30-2017
6
Posts
0
Kudos Received
0
Solutions
01-08-2018
06:43 PM
Hi,
I am following this tutorial here:
https://community.hortonworks.com/articles/79806/how-to-setup-openldap-24-on-centos-7.html
But when I executed below CLI: ldapmodify -Y EXTERNAL -H ldapi:/// -f db.ldif
I got this error: SASL/EXTERNAL authentication started
SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
SASL SSF: 0
ldapmodify: wrong attributeType at line 5, entry "olcDatabase={2}hdb,cn=config" Could someone help me troubleshoot? Thanks.
... View more
01-07-2018
09:36 PM
Hi all, This thread has been over a week. I just want to see if anyone has answer.
... View more
01-03-2018
07:39 AM
Can anyone help me make sense of this? I just tried some Scala code below and it worked fine via spark-shell: import java.sql.{DriverManager, Connection, Statement, ResultSet}
import java.sql.Timestamp
import scala.collection.mutable.MutableList
case class StatsRec (
code: String,
description: String
)
val url = "jdbc:hive2://localhost:10016/default"
val driver = "org.apache.hive.jdbc.HiveDriver"
val user = "maria_dev"
val password = "maria_dev"
val conn: Connection = DriverManager.getConnection(url, user, password)
val res: ResultSet = conn.createStatement.executeQuery("SELECT * FROM sample_07 LIMIT 5")
val fetchedRes = MutableList[StatsRec]()
while(res.next()) {
var rec = StatsRec(res.getString("code"), res.getString("description"))
fetchedRes += rec
}
conn.close()
val rddStatsDelta = sc.parallelize(fetchedRes)
rddStatsDelta.cache()
println(rddStatsDelta.count)
rddStatsDelta.collect.take(10).foreach(println) Output: StatsRec(00-0000,All Occupations)
StatsRec(11-0000,Management occupations)
StatsRec(11-1011,Chief executives)
StatsRec(11-1021,General and operations managers)
StatsRec(11-1031,Legislators) According to this post, looks like it's possible to get pyspark to work locally (not remote JDBC) but it doesn't say how to "fix" this error: http://belablotski.blogspot.in/2016/01/access-hive-tables-from-spark-using.html Help!!
... View more
12-31-2017
07:38 PM
The strange thing is that if I use Postgres, it works fine: df = sqlContext.read.format("jdbc").options(driver="org.postgresql.Driver", url="jdbc:postgresql://localhost:15436/mydb", dbtable="test1",user="bob", password="password").load()
... View more
12-31-2017
02:11 PM
I am using Hortonworks Sandbox VMware 2.6 and SSH into the Terminal to start pyspark:
su - hive -c pyspark
Then I typed below code: df = sqlContext.read.format("jdbc").options(driver="org.apache.hive.jdbc.HiveDriver", url="jdbc:hive2://localhost:10016/default", dbtable="sample_07",user="maria_dev", password="maria_dev").load()
I am just following exact example here: https://spark.apache.org/docs/1.6.3/sql-programming-guide.html#jdbc-to-other-databases
However, it gave me below error: 17/12/30 19:55:14 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10016/default
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/hdp/current/spark-client/python/pyspark/sql/readwriter.py", line 139, in load
return self._df(self._jreader.load())
File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/usr/hdp/current/spark-client/python/pyspark/sql/utils.py", line 45, in deco
return f(*a, **kw)
File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o119.load.
: java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:136)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:748)
I believe this is the correlated error in Hive log: 2017-12-30 19:57:06,030 ERROR [Thread-13]: util.PolicyRefresher (PolicyRefresher.java:loadPolicyfromPolicyAdmin(288)) - PolicyRefresher(serviceName=Sandbox_hive): failed to refresh policies. Will continue to use last known version of policies (24)
com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused (Connection refused)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.get(WebResource.java:503)
at org.apache.ranger.admin.client.RangerAdminRESTClient.getServicePoliciesIfUpdated(RangerAdminRESTClient.java:127)
at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicyfromPolicyAdmin(PolicyRefresher.java:264)
at org.apache.ranger.plugin.util.PolicyRefresher.loadPolicy(PolicyRefresher.java:202)
at org.apache.ranger.plugin.util.PolicyRefresher.run(PolicyRefresher.java:171)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:240)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:147)
... 8 more However, when I verified with beeline and 10016 STS port works fine: beeline> !connect jdbc:hive2://localhost:10016/default maria_dev maria_dev
Connecting to jdbc:hive2://localhost:10016/default
Connected to: Spark SQL (version 2.1.1.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10016/default> select * from sample_07 limit 2;
+----------+-------------------------+------------+---------+--+
| code | description | total_emp | salary |
+----------+-------------------------+------------+---------+--+
| 00-0000 | All Occupations | 134354250 | 40690 |
| 11-0000 | Management occupations | 6003930 | 96150 |
+----------+-------------------------+------------+---------+--+ So what did I do wrong in the pyspark code here? My goal is to have the authentication with JDBC. I guess I could go directly to the files and get the data or Metastore. But I'd like to use Spark Thrift Server for fine-grained authorization later. Please help.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
12-30-2017
09:27 PM
Which files did you end up using? I looked here http://repo.hortonworks.com/content/repositories/releases/org/apache/hive/hive-jdbc/ and there are tons of them.
... View more