Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDP 3.1 & Spark 2.3.2 - hive.table("default.table1").show is failing on java.io.IOException: java.lang.NullPointerException

avatar
New Contributor

On the fresh new cluster based on HDP 3.1.0 (Kerberized) I'm still facing to a problem with Spark and Hive reading. Connection via HWC is not working. When a try run hive.table("default.table1").show I'll get an error message:

java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:166)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD$lzycompute(DataSourceV2ScanExec.scala:64)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD(DataSourceV2ScanExec.scala:60)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDDs(DataSourceV2ScanExec.scala:79)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
  at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$collectFromPlan(Dataset.scala:3278)
  at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$anonfun$52.apply(Dataset.scala:3259)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
  at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:723)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:682)
  at org.apache.spark.sql.Dataset.show(Dataset.scala:691)
  ... 47 elided
Caused by: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:182)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:162)
  ... 72 more
Caused by: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:298)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:176)
  ... 73 more
Caused by: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300)
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286)
  at shadehive.org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:379)
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:280)
  ... 74 more
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:478)
  at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328)
  at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:952)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
  at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
  at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
  at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
  at com.sun.proxy.$Proxy72.fetchResults(Unknown Source)
  at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:564)
  at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:792)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
  at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
  at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
  ... 3 more
Caused by: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:162)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2738)
  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
  ... 25 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:225)
  at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
  at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:519)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:511)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
  ... 28 more
Caused by: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:498)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:210)
  ... 39 more
Caused by: java.lang.NullPointerException: null
  at org.apache.hadoop.hive.llap.LlapUtil.generateClusterName(LlapUtil.java:117)
  at org.apache.hadoop.hive.llap.coordinator.LlapCoordinator.getLlapSigner(LlapCoordinator.java:103)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:441)
  ... 40 more

I've checked official documentation and github page. All properties are OK, but still cannot read any data from hive. I'm using a standard hive jdbc connection, not interactive one since I'm not planning to use a LLAP engine.

Any idea what to set or check to avoid this error?

PS: I'm able to read metadata via hive connector such as database ... even writing to Hive is workrking, but not reading tables to DF.

1 ACCEPTED SOLUTION

avatar
Rising Star
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
9 REPLIES 9

avatar
Rising Star
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
New Contributor

can you help me to get resolve this issue 

avatar
Expert Contributor

hi @Pavel Stejskal @dbompart

i am facing the same problem, and i am not querying a hive managed table, its just an external table in hive, i am able to read the metadata but not the data , can you please tell me how you fixed it ?


avatar
New Contributor

Hi Tarek,

when I set Hive interactive correctly (tuning of resources is the most critical part otherwise reading was failing) all was running fine and smoothly.

In the end I built whole pipeline completely on Spark only as Hive Interactive was not needed anymore and for a large streaming or heavy batches was unstable - to many connections, some were already closed etc. I'm talking about volumes like 1,5 billions with foreachBatch sink. At this moment I can do a stream reading and compacting at the same time.

avatar
Explorer

I am also facing the same issue, I have followed documentation properly but still hitting the issue


count on dataframe is working fine but head is not


All I am doing it

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()
hive.executeQuery("select * from test1").head



Here is my stack trace




19/07/29 14:10:20 ERROR HiveWarehouseDataSourceReader: Unable to submit query to HS2
java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:166)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD$lzycompute(DataSourceV2ScanExec.scala:64)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD(DataSourceV2ScanExec.scala:60)
  at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDDs(DataSourceV2ScanExec.scala:79)
  at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
  at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337)
  at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
  at org.apache.spark.sql.Dataset.head(Dataset.scala:2496)
  ... 49 elided
Caused by: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:182)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:162)
  ... 70 more
Caused by: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:298)
  at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:176)
  ... 71 more
Caused by: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300)
  at shadehive.org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286)
  at shadehive.org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:379)
  at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:280)
  ... 72 more
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:478)
  at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328)
  at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:952)
  at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:564)
  at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:792)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837)
  at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822)
  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
  at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

�
  at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
  at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

�
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:162)
  at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2738)

�
  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)
  at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473)
  ... 13 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:225)
  at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927)
  at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
  at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994)
  at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940)
  at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:519)
  at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:511)
  at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
  ... 16 more
Caused by: java.io.IOException: java.lang.NullPointerException
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:498)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:210)
  ... 27 more
Caused by: java.lang.NullPointerException: null
  at org.apache.hadoop.hive.llap.LlapUtil.generateClusterName(LlapUtil.java:117)
  at org.apache.hadoop.hive.llap.coordinator.LlapCoordinator.getLlapSigner(LlapCoordinator.java:103)
  at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:441)
  ... 28 more

avatar
Explorer

Could you please share how you fixed this issue  ? 

avatar
New Contributor

the issue got fixed ,i am also acing the issue .


please help here to get it resolve 

avatar
Community Manager

@Abhishek_721  As this is an older post you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment that could aid others in providing a more accurate answer to your question. 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
Expert Contributor

Because I ran into this thread when looking how to solve this error and because we found a solution, I thought it might still serve some people if I share what solution we found.

We needed HWC to profile Hive managed + transactional tables from Ataccama (data quality solution). And we found someone who successfully got spark-submit working. We checked their settings and changed the spark-submit as follows:

COMMAND="$SPARK_HOME/bin/$SPARK_SUBMIT \

            --files $MYDIR/$LOG4J_FILE_NAME $SPARK_DRIVER_JAVA_OPTS $SPARK_DRIVER_OPTS \

        --jars {{ hwc_jar_path }} \

        --conf spark.security.credentials.hiveserver2.enabled=false \

        --conf "spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@{{ ad_realm }}" \

            --conf spark.dynamicAllocation.enable=false \

            --conf spark.hadoop.metastore.catalog.default=hive \

            --conf spark.yarn.maxAppAttempts=1 \

            --conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED \

            --conf spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED \

            --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \

            --conf spark.sql.legacy.timeParserPolicy=LEGACY \

            --conf spark.sql.legacy.typeCoercion.datetimeToString.enabled=true \

            --conf spark.sql.parquet.int96TimestampConversion=true \

            --conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions \

            --conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension \

            --conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \

            --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol \

            --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 \

        --class $CLASS $JARS $MYLIB $PROPF $LAUNCH $*";

exec $COMMAND

Probably the difference was in the spark.hadoop.metastore.catalog.default=hive setting.

In the above example are some Ansible variables:

hwc_jar_path"/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-141.jar"

ad_realm is our LDAP realm.

Hope it helps anyone.