About gmpinheiro

gmpinheiro · ‎08-06-2018

@hubbarja Hello, I decided not to open a new topic, but I'm currently facing issues when trying to connect pyspark with a HBase with Kerberos. The following code works if I shutdown Kerberos in HBase: %pyspark host = 'hostname' tablename = 'Test:Test2' conf = {"hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": tablename} keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter" valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter" hbase_rdd = sc.newAPIHadoopRDD("org.apache.hadoop.hbase.mapreduce.TableInputFormat","org.apache.hadoop.hbase.io.ImmutableBytesWritable","org.apache.hadoop.hbase.client.Result",keyConverter=keyConv,valueConverter=valueConv,conf=conf) hbase_rdd.collect() The following error is thrown with Kerberos on An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD. : org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=32, exceptions: Mon Aug 06 11:36:55 UTC 2018, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68623: row 'Test:Test2,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hostname,60020,1533550276857, seqNum=0 Best regards, Gil Pinheiro

Online	Offline
Last Visited	‎04-02-2019 09:17 AM

Member Since	‎07-05-2018 09:19 AM
Last Visited	‎04-02-2019 09:17 AM
Posts	8
Kudos received	3

Cloudera Community

Re: PySpark + YARN + Kerberos = Chaos?