Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDP 3.1 & Spark 2.3.2 - hive.table("default.table1").show is failing on java.io.IOException: java.lang.NullPointerException

Solved Go to solution

HDP 3.1 & Spark 2.3.2 - hive.table("default.table1").show is failing on java.io.IOException: java.lang.NullPointerException

New Contributor

On the fresh new cluster based on HDP 3.1.0 (Kerberized) I'm still facing to a problem with Spark and Hive reading. Connection via HWC is not working. When a try run hive.table("default.table1").show I'll get an error message:

java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:166)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD$lzycompute(DataSourceV2ScanExec.scala:64)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD(DataSourceV2ScanExec.scala:60)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDDs(DataSourceV2ScanExec.scala:79)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
  at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$collectFromPlan(Dataset.scala:3278)
  at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$anonfun$52.apply(Dataset.scala:3259)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:723)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:682)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:691)
  ... 47 elided
Caused by: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:182)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:162)
  ... 72 more
Caused by: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:298)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:176)
  ... 73 more
Caused by: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300)
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286)
  at shadehive.org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:379)
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:280)
  ... 74 more
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:478)
  at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328)
  at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:952)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
  at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
  at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
  at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
  at com.sun.proxy.$Proxy72.fetchResults(Unknown Source)
  at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:564)
  at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:792)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
  at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
  at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
  ... 3 more
Caused by: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:162)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2738)
  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
  ... 25 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:225)
  at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
  at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:519)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:511)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
  ... 28 more
Caused by: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:498)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:210)
  ... 39 more
Caused by: java.lang.NullPointerException: null
  at org.apache.hadoop.hive.llap.LlapUtil.generateClusterName(LlapUtil.java:117)
  at org.apache.hadoop.hive.llap.coordinator.LlapCoordinator.getLlapSigner(LlapCoordinator.java:103)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:441)
  ... 40 more

I've checked official documentation and github page. All properties are OK, but still cannot read any data from hive. I'm using a standard hive jdbc connection, not interactive one since I'm not planning to use a LLAP engine.

Any idea what to set or check to avoid this error?

PS: I'm able to read metadata via hive connector such as database ... even writing to Hive is workrking, but not reading tables to DF.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: HDP 3.1 & Spark 2.3.2 - hive.table("default.table1").show is failing on java.io.IOException: java.lang.NullPointerException

Contributor

@Pavel Stejskal

Using the HiveWarehouseConnector + Hiveserver2Interactive(LLAP for managed tables) is mandatory and the reasons are explained in the HDP3 documentation, if you're not using it then for sure the properties are not OK, if the namespace part of it is not configured to point to the hiveserver2Interactive znode ( I think that's what you meant), then that is not correct.

To read a table into a DF, you have to use HiveWarehouseSession's API, i.e:

val df = hive.executeQuery("select * from web_sales")

I'd like to suggest reading throught this entire article.

BR.

5 REPLIES 5

Re: HDP 3.1 & Spark 2.3.2 - hive.table("default.table1").show is failing on java.io.IOException: java.lang.NullPointerException

Contributor

@Pavel Stejskal

Using the HiveWarehouseConnector + Hiveserver2Interactive(LLAP for managed tables) is mandatory and the reasons are explained in the HDP3 documentation, if you're not using it then for sure the properties are not OK, if the namespace part of it is not configured to point to the hiveserver2Interactive znode ( I think that's what you meant), then that is not correct.

To read a table into a DF, you have to use HiveWarehouseSession's API, i.e:

val df = hive.executeQuery("select * from web_sales")

I'd like to suggest reading throught this entire article.

BR.

Re: HDP 3.1 & Spark 2.3.2 - hive.table("default.table1").show is failing on java.io.IOException: java.lang.NullPointerException

Expert Contributor

hi @Pavel Stejskal @dbompart

i am facing the same problem, and i am not querying a hive managed table, its just an external table in hive, i am able to read the metadata but not the data , can you please tell me how you fixed it ?


Re: HDP 3.1 & Spark 2.3.2 - hive.table("default.table1").show is failing on java.io.IOException: java.lang.NullPointerException

New Contributor

Hi Tarek,

when I set Hive interactive correctly (tuning of resources is the most critical part otherwise reading was failing) all was running fine and smoothly.

In the end I built whole pipeline completely on Spark only as Hive Interactive was not needed anymore and for a large streaming or heavy batches was unstable - to many connections, some were already closed etc. I'm talking about volumes like 1,5 billions with foreachBatch sink. At this moment I can do a stream reading and compacting at the same time.

Highlighted

Re: HDP 3.1 & Spark 2.3.2 - hive.table("default.table1").show is failing on java.io.IOException: java.lang.NullPointerException

New Contributor

I am also facing the same issue, I have followed documentation properly but still hitting the issue


count on dataframe is working fine but head is not


All I am doing it

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()
hive.executeQuery("select * from test1").head



Here is my stack trace




19/07/29 14:10:20 ERROR HiveWarehouseDataSourceReader: Unable to submit query to HS2
java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:166)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD$lzycompute(DataSourceV2ScanExec.scala:64)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD(DataSourceV2ScanExec.scala:60)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDDs(DataSourceV2ScanExec.scala:79)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2496)
  ... 49 elided
Caused by: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:182)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:162)
  ... 70 more
Caused by: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:298)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:176)
  ... 71 more
Caused by: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300)
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286)
  at shadehive.org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:379)
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:280)
  ... 72 more
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:478)
  at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328)
  at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:952)
  at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:564)
  at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:792)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

�
  at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
  at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

�
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:162)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2738)

�
  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
  ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:225)
  at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
  at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:519)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:511)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
  ... 16 more
Caused by: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:498)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:210)
  ... 27 more
Caused by: java.lang.NullPointerException: null
  at org.apache.hadoop.hive.llap.LlapUtil.generateClusterName(LlapUtil.java:117)
  at org.apache.hadoop.hive.llap.coordinator.LlapCoordinator.getLlapSigner(LlapCoordinator.java:103)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:441)
  ... 28 more

Re: HDP 3.1 & Spark 2.3.2 - hive.table("default.table1").show is failing on java.io.IOException: java.lang.NullPointerException

New Contributor

Could you please share how you fixed this issue  ? 

Don't have an account?
Coming from Hortonworks? Activate your account here