Member since
10-28-2020
623
Posts
47
Kudos Received
40
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3193 | 02-17-2025 06:54 AM | |
| 8085 | 07-23-2024 11:49 PM | |
| 1767 | 05-28-2024 11:06 AM | |
| 2403 | 05-05-2024 01:27 PM | |
| 1548 | 05-05-2024 01:09 PM |
08-18-2024
12:52 PM
Until now, you could query JDBC data sources from Hive using JDBC Storage Handler. However, with the release of CDP 7.2.18 (Public Cloud runtime) and 7.1.9 SP 1 (Private Cloud Base), you can also query Hive tables hosted in a different environment using a similar method. Here's a brief demo:
0: jdbc:hive2://ccycloud-1.sbiswal.root.como> CREATE EXTERNAL TABLE loc_hive_table(c1 int, c2 string) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES(
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.database.type"="HIVE",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.jdbc.driver"="org.apache.hive.jdbc.HiveDriver",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.jdbc.url"="jdbc:hive2://sbiswal-jdbctest-master0.sup-defa.xxxxxxx.a4.cloudera.site/db1;ssl=true;transportMode=http;httpPath=sbiswal-jdbctest/cdp-proxy-api/hive",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.dbcp.username”= ”user1”,
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.dbcp.password"="********",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.table"="src_tbl",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.dbcp.maxActive"="1");
0: jdbc:hive2://ccycloud-1.sbiswal.root.como> select * from loc_hive_table;
INFO : Compiling command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f): select * from loc_hive_table
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:loc_hive_table.c1, type:int, comment:null), FieldSchema(name:loc_hive_table.c2, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f); Time taken: 10.033 seconds
INFO : Executing command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f): select * from loc_hive_table
INFO : Completed executing command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f); Time taken: 0.02 seconds
INFO : OK
+---------------------+---------------------+
| loc_hive_table.c1 | loc_hive_table.c2 |
+---------------------+---------------------+
| 1 | abc |
| 2 | def |
+---------------------+---------------------+
2 rows selected (12.581 seconds)
0: jdbc:hive2://ccycloud-1.sbiswal.root.como>
The main point to highlight here is hive.sql.database.type"="HIVE.
In this example, a Hive table db1.src_tbl in a remote cluster. To read the table from a different Hive cluster, an external table loc_hive_table had to be created leveraging JDBCStorageHandler. As you can see, the data was read successfully.
If the table to be queried is under a non-default database, it can be defined as follows:
Mention the DB name in the JDBC connection string, as used above: "hive.sql.schema"="db1.test1":
You could query the table from a secured cluster using LDAP authentication. Please be informed that you can only read and not write to the remote Hive table using this approach.
If you want to use any aggregate function in the query and encounter any of the following errors in the YARN application, we may have to add some jars at the session level.
java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver
java.lang.NoClassDefFoundError: org/apache/hive/service/cli/HiveSQLException
Add equivalent versions of these jars from your local parcel directory,
0: jdbc:hive2://ccycloud-1.sbiswal.root.como> ADD JAR /opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-3.1.3000.7.1.9.1000-103.jar;
0: jdbc:hive2://ccycloud-1.sbiswal.root.como> ADD JAR /opt/cloudera/parcels/CDH/lib/hive/lib/hive-service-3.1.3000.7.1.9.1000-103.jar;
Now you should be able to run your query successfully.
... View more
08-12-2024
04:46 AM
1 Kudo
Just got the article access. By the way this issue occurs in mysql even not a galera cluster types. even i make the same Mariadb as master / slave, the same issue occurs, so this is not the problem with galera cluster, is for sure.
... View more
08-09-2024
11:03 AM
@smruti I've validated it with the user, and here's what happens. The date field comes in string format and it only comes with 8 fields "Yyyy-mm-dd" and it was converting to time stamp which in this case aggregates the time fields, and that's where the divergence was. Is there anything in the tool that can be done to standardize this or do I need to change the data engineering?
... View more
07-25-2024
08:34 PM
1 Kudo
@AhXian I hope we have copied the bucket files correctly, and also ran MSCK REPAIR. Could you also run ANALYZE.. COMPUTE STATISTICS on the external table once. Make sure the DESCRIBE command on the external table reflects the number of data files correctly. Please share describe and "hdfs dfs -ls -R <externa table location>" o/p
... View more
07-09-2024
08:18 AM
@ldylag you can check which database the service uses by checking its configuration in the cloudera manager. For instance, go to hive service » configuration » and search for database. The configuration named "hive_metastore_database_name" will tell you the actual database hive service is using. I hope this helps!
... View more
07-01-2024
10:28 AM
@kaif Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
06-27-2024
11:02 PM
@kaif Hive does have information_schema database, that you can access from beeline or Hue. You also have SYS db that fetches different information from the RDBMS that is used to store hive metadata. Here's the DDLs of those tables, and this will also give you some idea on what data they store.
... View more
06-01-2024
12:49 AM
1 Kudo
There has been a update in external db hive is using and new version instance is running on a new port this config change was made when whole cluster was stopped @smruti Can you please help with how is the data loaded from hive metastore to postgres sql is it through pyscopg and where is it stored in database server cause we need to get the missing data from older version of db but cannot locate where the source of this corrupted data is on db server
... View more
05-28-2024
12:55 PM
Thanks @smruti I changed data type to String and data is stored without encoding as I want. Is there any documentation where I can find "Why the encoding is being performed by HBase SerDe, for the binary data"? regards, Marks
... View more
05-24-2024
04:04 AM
1 Kudo
Was a Kerberos issue this is resolved
... View more