Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Master Collaborator

Until now, you could query JDBC data sources from Hive using JDBC Storage Handler. However, with the release of CDP 7.2.18 (Public Cloud runtime) and 7.1.9 SP 1 (Private Cloud Base), you can also query Hive tables hosted in a different environment using a similar method.

Here's a brief demo:

 

0: jdbc:hive2://ccycloud-1.sbiswal.root.como> CREATE EXTERNAL TABLE loc_hive_table(c1 int, c2 string) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES(
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.database.type"="HIVE",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.jdbc.driver"="org.apache.hive.jdbc.HiveDriver",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.jdbc.url"="jdbc:hive2://sbiswal-jdbctest-master0.sup-defa.xxxxxxx.a4.cloudera.site/db1;ssl=true;transportMode=http;httpPath=sbiswal-jdbctest/cdp-proxy-api/hive",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.dbcp.username”= ”user1”,
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.dbcp.password"="********",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.table"="src_tbl",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.dbcp.maxActive"="1");


0: jdbc:hive2://ccycloud-1.sbiswal.root.como> select * from loc_hive_table;
INFO  : Compiling command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f): select * from loc_hive_table
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:loc_hive_table.c1, type:int, comment:null), FieldSchema(name:loc_hive_table.c2, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f); Time taken: 10.033 seconds
INFO  : Executing command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f): select * from loc_hive_table
INFO  : Completed executing command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f); Time taken: 0.02 seconds
INFO  : OK
+---------------------+---------------------+
| loc_hive_table.c1  | loc_hive_table.c2  |
+---------------------+---------------------+
| 1                   | abc                 |
| 2                   | def                 |
+---------------------+---------------------+
2 rows selected (12.581 seconds)
0: jdbc:hive2://ccycloud-1.sbiswal.root.como> 

 

The main point to highlight here is hive.sql.database.type"="HIVE.

In this example, a Hive table db1.src_tbl in a remote cluster. To read the table from a different Hive cluster, an external table loc_hive_table had to be created leveraging JDBCStorageHandler. As you can see, the data was read successfully. 

If the table to be queried is under a non-default database, it can be defined as follows:

  1. Mention the DB name in the JDBC connection string, as used above: "hive.sql.schema"="db1.test1":
  2. You could query the table from a secured cluster using LDAP authentication. Please be informed that you can only read and not write to the remote Hive table using this approach.

If you want to use any aggregate function in the query and encounter any of the following errors in the YARN application, we may have to add some jars at the session level.

  1. java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver
  2. java.lang.NoClassDefFoundError: org/apache/hive/service/cli/HiveSQLException

Add equivalent versions of these jars from your local parcel directory,

 

0: jdbc:hive2://ccycloud-1.sbiswal.root.como> ADD JAR /opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-3.1.3000.7.1.9.1000-103.jar;
0: jdbc:hive2://ccycloud-1.sbiswal.root.como> ADD JAR /opt/cloudera/parcels/CDH/lib/hive/lib/hive-service-3.1.3000.7.1.9.1000-103.jar;

 

Now you should be able to run your query successfully. 

274 Views
0 Kudos