Created 01-08-2019 11:01 PM
On the fresh new cluster based on HDP 3.1.0 (Kerberized) I'm still facing to a problem with Spark and Hive reading. Connection via HWC is not working. When a try run hive.table("default.table1").show I'll get an error message:
java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:166) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD$lzycompute(DataSourceV2ScanExec.scala:64) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD(DataSourceV2ScanExec.scala:60) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDDs(DataSourceV2ScanExec.scala:79) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$collectFromPlan(Dataset.scala:3278) at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2489) at org.apache.spark.sql.Dataset$anonfun$head$1.apply(Dataset.scala:2489) at org.apache.spark.sql.Dataset$anonfun$52.apply(Dataset.scala:3259) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258) at org.apache.spark.sql.Dataset.head(Dataset.scala:2489) at org.apache.spark.sql.Dataset.take(Dataset.scala:2703) at org.apache.spark.sql.Dataset.showString(Dataset.scala:254) at org.apache.spark.sql.Dataset.show(Dataset.scala:723) at org.apache.spark.sql.Dataset.show(Dataset.scala:682) at org.apache.spark.sql.Dataset.show(Dataset.scala:691) ... 47 elided Caused by: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:182) at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:162) ... 72 more Caused by: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:298) at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:176) ... 73 more Caused by: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at shadehive.org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300) at shadehive.org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286) at shadehive.org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:379) at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:280) ... 74 more Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:478) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:952) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy72.fetchResults(Unknown Source) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:564) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:792) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ... 3 more Caused by: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:162) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2738) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) ... 25 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:225) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:519) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:511) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) ... 28 more Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:498) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:210) ... 39 more Caused by: java.lang.NullPointerException: null at org.apache.hadoop.hive.llap.LlapUtil.generateClusterName(LlapUtil.java:117) at org.apache.hadoop.hive.llap.coordinator.LlapCoordinator.getLlapSigner(LlapCoordinator.java:103) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:441) ... 40 more
I've checked official documentation and github page. All properties are OK, but still cannot read any data from hive. I'm using a standard hive jdbc connection, not interactive one since I'm not planning to use a LLAP engine.
Any idea what to set or check to avoid this error?
PS: I'm able to read metadata via hive connector such as database ... even writing to Hive is workrking, but not reading tables to DF.
Created 01-09-2019 12:17 AM
Using the HiveWarehouseConnector + Hiveserver2Interactive(LLAP for managed tables) is mandatory and the reasons are explained in the HDP3 documentation, if you're not using it then for sure the properties are not OK, if the namespace part of it is not configured to point to the hiveserver2Interactive znode ( I think that's what you meant), then that is not correct.
To read a table into a DF, you have to use HiveWarehouseSession's API, i.e:
val df = hive.executeQuery("select * from web_sales")
I'd like to suggest reading throught this entire article.
BR.
Created 01-09-2019 12:17 AM
Using the HiveWarehouseConnector + Hiveserver2Interactive(LLAP for managed tables) is mandatory and the reasons are explained in the HDP3 documentation, if you're not using it then for sure the properties are not OK, if the namespace part of it is not configured to point to the hiveserver2Interactive znode ( I think that's what you meant), then that is not correct.
To read a table into a DF, you have to use HiveWarehouseSession's API, i.e:
val df = hive.executeQuery("select * from web_sales")
I'd like to suggest reading throught this entire article.
BR.
Created 04-10-2020 04:09 AM
can you help me to get resolve this issue
Created 05-11-2019 07:56 AM
i am facing the same problem, and i am not querying a hive managed table, its just an external table in hive, i am able to read the metadata but not the data , can you please tell me how you fixed it ?
Created 05-13-2019 07:24 AM
Hi Tarek,
when I set Hive interactive correctly (tuning of resources is the most critical part otherwise reading was failing) all was running fine and smoothly.
In the end I built whole pipeline completely on Spark only as Hive Interactive was not needed anymore and for a large streaming or heavy batches was unstable - to many connections, some were already closed etc. I'm talking about volumes like 1,5 billions with foreachBatch sink. At this moment I can do a stream reading and compacting at the same time.
Created 07-29-2019 03:07 PM
I am also facing the same issue, I have followed documentation properly but still hitting the issue
count on dataframe is working fine but head is not
All I am doing it
val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build() hive.executeQuery("select * from test1").head
Here is my stack trace
19/07/29 14:10:20 ERROR HiveWarehouseDataSourceReader: Unable to submit query to HS2 java.lang.RuntimeException: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:166) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD$lzycompute(DataSourceV2ScanExec.scala:64) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDD(DataSourceV2ScanExec.scala:60) at org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExec.inputRDDs(DataSourceV2ScanExec.scala:79) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489) at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258) at org.apache.spark.sql.Dataset.head(Dataset.scala:2489) at org.apache.spark.sql.Dataset.head(Dataset.scala:2496) ... 49 elided Caused by: java.lang.RuntimeException: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:182) at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.createBatchDataReaderFactories(HiveWarehouseDataSourceReader.java:162) ... 70 more Caused by: java.io.IOException: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:298) at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataSourceReader.getSplitsFactories(HiveWarehouseDataSourceReader.java:176) ... 71 more Caused by: shadehive.org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at shadehive.org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:300) at shadehive.org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:286) at shadehive.org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:379) at org.apache.hadoop.hive.llap.LlapBaseInputFormat.getSplits(LlapBaseInputFormat.java:280) ... 72 more Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:478) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:328) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:952) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:564) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:792) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1837) at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1822) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) � at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) � at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:162) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2738) � at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:473) ... 13 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:225) at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:116) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:519) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:511) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) ... 16 more Caused by: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:498) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.process(GenericUDTFGetSplits.java:210) ... 27 more Caused by: java.lang.NullPointerException: null at org.apache.hadoop.hive.llap.LlapUtil.generateClusterName(LlapUtil.java:117) at org.apache.hadoop.hive.llap.coordinator.LlapCoordinator.getLlapSigner(LlapCoordinator.java:103) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFGetSplits.getSplits(GenericUDTFGetSplits.java:441) ... 28 more
Created 09-09-2019 07:32 AM
Could you please share how you fixed this issue ?
Created 04-10-2020 04:08 AM
the issue got fixed ,i am also acing the issue .
please help here to get it resolve
Created 04-10-2020 06:42 AM
@Abhishek_721 As this is an older post you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment that could aid others in providing a more accurate answer to your question.
Created 05-16-2024 05:48 AM
Because I ran into this thread when looking how to solve this error and because we found a solution, I thought it might still serve some people if I share what solution we found.
We needed HWC to profile Hive managed + transactional tables from Ataccama (data quality solution). And we found someone who successfully got spark-submit working. We checked their settings and changed the spark-submit as follows:
COMMAND="$SPARK_HOME/bin/$SPARK_SUBMIT \
--files $MYDIR/$LOG4J_FILE_NAME $SPARK_DRIVER_JAVA_OPTS $SPARK_DRIVER_OPTS \
--jars {{ hwc_jar_path }} \
--conf spark.security.credentials.hiveserver2.enabled=false \
--conf "spark.sql.hive.hiveserver2.jdbc.url.principal=hive/_HOST@{{ ad_realm }}" \
--conf spark.dynamicAllocation.enable=false \
--conf spark.hadoop.metastore.catalog.default=hive \
--conf spark.yarn.maxAppAttempts=1 \
--conf spark.sql.legacy.parquet.int96RebaseModeInRead=CORRECTED \
--conf spark.sql.legacy.parquet.int96RebaseModeInWrite=CORRECTED \
--conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \
--conf spark.sql.legacy.timeParserPolicy=LEGACY \
--conf spark.sql.legacy.typeCoercion.datetimeToString.enabled=true \
--conf spark.sql.parquet.int96TimestampConversion=true \
--conf spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions \
--conf spark.sql.extensions=com.qubole.spark.hiveacid.HiveAcidAutoConvertExtension \
--conf spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator \
--conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol \
--conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 \
--class $CLASS $JARS $MYLIB $PROPF $LAUNCH $*";
exec $COMMAND
Probably the difference was in the spark.hadoop.metastore.catalog.default=hive setting.
In the above example are some Ansible variables:
hwc_jar_path: "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-141.jar"
ad_realm is our LDAP realm.
Hope it helps anyone.