About Asim-

Asim- · ‎08-02-2022

Hi @jagadeesan , I am trying to connect to hive with spark3 via JDBC Hive driver (HiveJDBC42) And I am getting the bellow error: import org.apache.spark.sql.SparkSession val spark = SparkSession.builder().appName("Spark - Hive").config("spark.sql.warehouse.dir", "/warehouse/tablespace/managed/hive").enableHiveSupport().getOrCreate() val table_users = spark.read.format("jdbc"). option("url","hive"). option("url", "jdbc:hive2://127.0.0.1:2181:2181;password=****;principal=hive/_HOST@Example.com;serviceDiscoveryMode=zooKeeper;ssl=1;user=user1;zooKeeperNamespace=hiveserver2"). option("driver","com.cloudera.hive.jdbc.HS2Driver"). option("query","select * from test_db.users LIMIT 1"). option("fetchsize","20"). load() java.sql.SQLException: [Cloudera][JDBC](11380) Null pointer exception. at com.cloudera.hiveserver2.hive.core.HiveJDBCConnection.setZookeeperServiceDiscovery(Unknown Source) at com.cloudera.hiveserver2.hive.core.HiveJDBCConnection.readServiceDiscoverySettings(Unknown Source) at com.cloudera.hiveserver2.hivecommon.core.HiveJDBCCommonConnection.readServiceDiscoverySettings(Unknown Source) at com.cloudera.hiveserver2.hivecommon.core.HiveJDBCCommonConnection.establishConnection(Unknown Source) at com.cloudera.hiveserver2.jdbc.core.LoginTimeoutConnection.connect(Unknown Source) at com.cloudera.hiveserver2.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source) at com.cloudera.hiveserver2.jdbc.common.AbstractDriver.connect(Unknown Source) at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49) at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.create(ConnectionProvider.scala:77) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:64) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.getQueryOutputSchema(JDBCRDD.scala:62) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:57) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:239) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174) ... 54 elided Caused by: java.lang.NullPointerException

Asim- · ‎07-30-2022

Thank you @jagadeesan, But, is it possible to connect to the Hive via JDBC from spark 3.x ?

Asim- · ‎07-30-2022

Thank you @jagadeesan for your reply, As far as I know, HWC does not support INSERT/UPDATE in Hive ACID Tables, Correct me if I'm wrong. Also is there any way to connect to Hive ACID tables now for spark 3 instead of HWC. Thank you!

Asim- · ‎07-26-2022

Hi guys, I have a Data lake (Hive Managed tables base) and I would like to do an Incremental approach to the Warehouse (Hive Managed tables base) using spark v3.2, and I faced an issue connection to Hive Managed tables with spark3. And I would like to know, How to connect to Hive ACID Tables? Using JDBC if yes how? Or there are other ways? Thank you!

Asim- · ‎03-07-2022

{ "_id": "620e6275034f4fe64f1ce2ef", "patientorderitems": [ { "_id": "620e6275034f4fe64f1ce2f0", "patientorderlogs": [ { "_id": "620e6275034f4fe64f1ce2f1", "useruid": "6031edd256afd66888232d6e", "departmentuid": "602f6a3494ce862c04aa49d2" }, { "_id": "621efc35da15edd34560da80", "useruid": "6032021359f2cf686ae807ba", "departmentuid": "602f6a3494ce862c04aa49d5" }, { "_id": "6220a702061f33f4abe8a2a6", "useruid": "604f3cb743027274a8c565de", "departmentuid": "604c5393864aa9012d79e986" }, { "_id": "6220a70f65ca50f522598a85", "useruid": "604f3cb743027274a8c565de", "departmentuid": "604c5393864aa9012d79e986" }, { "_id": "6220a717145139f53cfb6143", "useruid": "604f3cb743027274a8c565de", "departmentuid": "604c5393864aa9012d79e986" } ] } ] } dear araujo can I have a jolt specification for this JSON as the same format of last JSON?

Asim- · ‎03-02-2022

I want to transform this JSON: { "_id" : "6218e53465793fa20ea11524", "patientorderitems" : [ { "poi_id" : "6218e53465793fa20ea1152a", "patientorderlogs" : [ { "pol_id" : "6218e53465793fa20ea1152e", "useruid" : "61ee4995f16eebb6b7e1c644", "modifiedat" : "2022-02-25T17:18:28Z" } ] }, { "poi_id" : "6218e53465793fa20ea11525", "patientorderlogs" : [ { "pol_id" : "6218e53465793fa20ea11529", "useruid" : "61ee4995f16eebb6b7e1c644", "modifiedat" : "2022-02-25T17:18:28Z" } ] } ] } To this JSON: { { "_id" : "6218e53465793fa20ea11524", "poi_id" : "6218e53465793fa20ea1152a", "pol_id" : "6218e53465793fa20ea1152e", "useruid" : "61ee4995f16eebb6b7e1c644", "modifiedat" : "2022-02-25T17:18:28Z" }, { "poi_id" : "6218e53465793fa20ea11524", "poi_id" : "6218e53465793fa20ea11525", "pol_id" : "6218e53465793fa20ea11529", "useruid" : "61ee4995f16eebb6b7e1c644", "modifiedat" : "2022-02-25T17:18:28Z" } } Is there any Jolt format or scripts can do this?

Online	Offline
Last Visited	‎04-01-2023 04:37 PM

Member Since	‎03-01-2022 11:52 PM
Last Visited	‎04-01-2023 04:37 PM
Posts	15

Cloudera Community

Re: Spark3 connection to HIVE ACID Tables

Re: Spark3 connection to HIVE ACID Tables

Re: Spark3 connection to HIVE ACID Tables

Spark3 connection to HIVE ACID Tables

Re: Jolt-JSON Transformation

Jolt-JSON Transformation