Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDP 3.1 & Spark 2.3.2 - hive.table("default.table1").show is failing on java.io.IOException: java.lang.NullPointerException

avatar
New Contributor

On the fresh new cluster based on HDP 3.1.0 (Kerberized) I'm still facing to a problem with Spark and Hive reading. Connection via HWC is not working. When a try run hive.table("default.table1").show I'll get an error message:

java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:166)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD$lzycompute(DataSourceV2ScanExec.scala:64)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD(DataSourceV2ScanExec.scala:60)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDDs(DataSourceV2ScanExec.scala:79)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
  at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$collectFromPlan(Dataset.scala:3278)
  at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$anonfun$52.apply(Dataset.scala:3259)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:723)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:682)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:691)
  ... 47 elided
Caused by: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:182)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:162)
  ... 72 more
Caused by: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:298)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:176)
  ... 73 more
Caused by: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300)
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286)
  at shadehive.org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:379)
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:280)
  ... 74 more
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:478)
  at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328)
  at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:952)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
  at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
  at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
  at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
  at com.sun.proxy.$Proxy72.fetchResults(Unknown Source)
  at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:564)
  at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:792)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
  at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
  at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
  ... 3 more
Caused by: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:162)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2738)
  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
  ... 25 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:225)
  at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
  at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:519)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:511)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
  ... 28 more
Caused by: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:498)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:210)
  ... 39 more
Caused by: java.lang.NullPointerException: null
  at org.apache.hadoop.hive.llap.LlapUtil.generateClusterName(LlapUtil.java:117)
  at org.apache.hadoop.hive.llap.coordinator.LlapCoordinator.getLlapSigner(LlapCoordinator.java:103)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:441)
  ... 40 more

I've checked official documentation and github page. All properties are OK, but still cannot read any data from hive. I'm using a standard hive jdbc connection, not interactive one since I'm not planning to use a LLAP engine.

Any idea what to set or check to avoid this error?

PS: I'm able to read metadata via hive connector such as database ... even writing to Hive is workrking, but not reading tables to DF.

1 ACCEPTED SOLUTION

avatar
Rising Star

@Pavel Stejskal

Using the HiveWarehouseConnector + Hiveserver2Interactive(LLAP for managed tables) is mandatory and the reasons are explained in the HDP3 documentation, if you're not using it then for sure the properties are not OK, if the namespace part of it is not configured to point to the hiveserver2Interactive znode ( I think that's what you meant), then that is not correct.

To read a table into a DF, you have to use HiveWarehouseSession's API, i.e:

val df = hive.executeQuery("select * from web_sales")

I'd like to suggest reading throught this entire article.

BR.

View solution in original post

8 REPLIES 8

avatar
Rising Star

@Pavel Stejskal

Using the HiveWarehouseConnector + Hiveserver2Interactive(LLAP for managed tables) is mandatory and the reasons are explained in the HDP3 documentation, if you're not using it then for sure the properties are not OK, if the namespace part of it is not configured to point to the hiveserver2Interactive znode ( I think that's what you meant), then that is not correct.

To read a table into a DF, you have to use HiveWarehouseSession's API, i.e:

val df = hive.executeQuery("select * from web_sales")

I'd like to suggest reading throught this entire article.

BR.

avatar
New Contributor

can you help me to get resolve this issue 

avatar
Expert Contributor

hi @Pavel Stejskal @dbompart

i am facing the same problem, and i am not querying a hive managed table, its just an external table in hive, i am able to read the metadata but not the data , can you please tell me how you fixed it ?


avatar
New Contributor

Hi Tarek,

when I set Hive interactive correctly (tuning of resources is the most critical part otherwise reading was failing) all was running fine and smoothly.

In the end I built whole pipeline completely on Spark only as Hive Interactive was not needed anymore and for a large streaming or heavy batches was unstable - to many connections, some were already closed etc. I'm talking about volumes like 1,5 billions with foreachBatch sink. At this moment I can do a stream reading and compacting at the same time.

avatar
Explorer

I am also facing the same issue, I have followed documentation properly but still hitting the issue


count on dataframe is working fine but head is not


All I am doing it

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()
hive.executeQuery("select * from test1").head



Here is my stack trace




19/07/29 14:10:20 ERROR HiveWarehouseDataSourceReader: Unable to submit query to HS2
java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:166)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD$lzycompute(DataSourceV2ScanExec.scala:64)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD(DataSourceV2ScanExec.scala:60)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDDs(DataSourceV2ScanExec.scala:79)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2496)
  ... 49 elided
Caused by: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:182)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:162)
  ... 70 more
Caused by: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:298)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:176)
  ... 71 more
Caused by: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300)
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286)
  at shadehive.org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:379)
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:280)
  ... 72 more
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:478)
  at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328)
  at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:952)
  at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:564)
  at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:792)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

�
  at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
  at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

�
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:162)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2738)

�
  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
  ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:225)
  at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
  at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:519)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:511)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
  ... 16 more
Caused by: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:498)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:210)
  ... 27 more
Caused by: java.lang.NullPointerException: null
  at org.apache.hadoop.hive.llap.LlapUtil.generateClusterName(LlapUtil.java:117)
  at org.apache.hadoop.hive.llap.coordinator.LlapCoordinator.getLlapSigner(LlapCoordinator.java:103)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:441)
  ... 28 more

avatar
Explorer

Could you please share how you fixed this issue  ? 

avatar
New Contributor

the issue got fixed ,i am also acing the issue .


please help here to get it resolve 

avatar
Community Manager

@Abhishek_721  As this is an older post you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment that could aid others in providing a more accurate answer to your question. 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.