Member since
07-05-2018
8
Posts
3
Kudos Received
0
Solutions
08-06-2018
04:47 AM
1 Kudo
@hubbarja Hello, I decided not to open a new topic, but I'm currently facing issues when trying to connect pyspark with a HBase with Kerberos. The following code works if I shutdown Kerberos in HBase: %pyspark
host = 'hostname'
tablename = 'Test:Test2'
conf = {"hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": tablename}
keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"
valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
hbase_rdd = sc.newAPIHadoopRDD("org.apache.hadoop.hbase.mapreduce.TableInputFormat","org.apache.hadoop.hbase.io.ImmutableBytesWritable","org.apache.hadoop.hbase.client.Result",keyConverter=keyConv,valueConverter=valueConv,conf=conf)
hbase_rdd.collect() The following error is thrown with Kerberos on An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=32, exceptions:
Mon Aug 06 11:36:55 UTC 2018, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68623: row 'Test:Test2,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hostname,60020,1533550276857, seqNum=0 Best regards, Gil Pinheiro
... View more