Member since
10-28-2020
578
Posts
46
Kudos Received
40
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
764 | 02-17-2025 06:54 AM | |
4996 | 07-23-2024 11:49 PM | |
868 | 05-28-2024 11:06 AM | |
1423 | 05-05-2024 01:27 PM | |
903 | 05-05-2024 01:09 PM |
09-09-2024
12:36 AM
1 Kudo
@Viki_Nodejs It seems like the hostname resolution fails. The url you have shared is rather jdbc compliant. Could you give the following a try? client.connect(
{
host: 'abc.com',
port: 2181
},
new hive.connections.HttpConnection({
transportMode: 'http',
httpPath: 'cliservice',
serviceDiscoveryMode: 'zooKeeper',
zooKeeperNamespace: 'hiveserver2'
}),
new hive.auth.PlainHttpAuthentication({
username: 'username',
password: 'password'
}),
{
ssl: true,
sslTrustStore: '/etc/hive-jks/hivetrust.jks',
trustStorePassword: 'pro@!23'
}
).then(async client => {
... View more
08-18-2024
12:52 PM
Until now, you could query JDBC data sources from Hive using JDBC Storage Handler. However, with the release of CDP 7.2.18 (Public Cloud runtime) and 7.1.9 SP 1 (Private Cloud Base), you can also query Hive tables hosted in a different environment using a similar method. Here's a brief demo:
0: jdbc:hive2://ccycloud-1.sbiswal.root.como> CREATE EXTERNAL TABLE loc_hive_table(c1 int, c2 string) STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler' TBLPROPERTIES(
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.database.type"="HIVE",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.jdbc.driver"="org.apache.hive.jdbc.HiveDriver",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.jdbc.url"="jdbc:hive2://sbiswal-jdbctest-master0.sup-defa.xxxxxxx.a4.cloudera.site/db1;ssl=true;transportMode=http;httpPath=sbiswal-jdbctest/cdp-proxy-api/hive",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.dbcp.username”= ”user1”,
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.dbcp.password"="********",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.table"="src_tbl",
. . . . . . . . . . . . . . . . . . . . . . .> "hive.sql.dbcp.maxActive"="1");
0: jdbc:hive2://ccycloud-1.sbiswal.root.como> select * from loc_hive_table;
INFO : Compiling command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f): select * from loc_hive_table
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:loc_hive_table.c1, type:int, comment:null), FieldSchema(name:loc_hive_table.c2, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f); Time taken: 10.033 seconds
INFO : Executing command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f): select * from loc_hive_table
INFO : Completed executing command(queryId=hive_20240818172147_9c14d21e-1802-43a6-b5a4-0846ff5bad7f); Time taken: 0.02 seconds
INFO : OK
+---------------------+---------------------+
| loc_hive_table.c1 | loc_hive_table.c2 |
+---------------------+---------------------+
| 1 | abc |
| 2 | def |
+---------------------+---------------------+
2 rows selected (12.581 seconds)
0: jdbc:hive2://ccycloud-1.sbiswal.root.como>
The main point to highlight here is hive.sql.database.type"="HIVE.
In this example, a Hive table db1.src_tbl in a remote cluster. To read the table from a different Hive cluster, an external table loc_hive_table had to be created leveraging JDBCStorageHandler. As you can see, the data was read successfully.
If the table to be queried is under a non-default database, it can be defined as follows:
Mention the DB name in the JDBC connection string, as used above: "hive.sql.schema"="db1.test1":
You could query the table from a secured cluster using LDAP authentication. Please be informed that you can only read and not write to the remote Hive table using this approach.
If you want to use any aggregate function in the query and encounter any of the following errors in the YARN application, we may have to add some jars at the session level.
java.lang.ClassNotFoundException: org.apache.hive.jdbc.HiveDriver
java.lang.NoClassDefFoundError: org/apache/hive/service/cli/HiveSQLException
Add equivalent versions of these jars from your local parcel directory,
0: jdbc:hive2://ccycloud-1.sbiswal.root.como> ADD JAR /opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-3.1.3000.7.1.9.1000-103.jar;
0: jdbc:hive2://ccycloud-1.sbiswal.root.como> ADD JAR /opt/cloudera/parcels/CDH/lib/hive/lib/hive-service-3.1.3000.7.1.9.1000-103.jar;
Now you should be able to run your query successfully.
... View more
08-05-2024
03:31 AM
1 Kudo
@evanle96 Hive would treat the timestamp data to be in UTC format, and when we read it, it converts it from UTC to local time. But I understand that the actual data is already in America/Sao_Paulo tz. It'd be interesting to know how do we transform the epoch data in string format to timestamp in ORC? from_utc_timestamp(to_utc_timestamp(dt_afast_trabalho, 'America/Sao_Paulo'), 'America/Sao_Paulo') AS sao_paulo_time_corrected Can we try reading with the above transformation and see if the house field shows up as expected?
... View more
07-25-2024
08:34 PM
1 Kudo
@AhXian I hope we have copied the bucket files correctly, and also ran MSCK REPAIR. Could you also run ANALYZE.. COMPUTE STATISTICS on the external table once. Make sure the DESCRIBE command on the external table reflects the number of data files correctly. Please share describe and "hdfs dfs -ls -R <externa table location>" o/p
... View more
07-25-2024
06:12 AM
@AhXian 1. Are you able to read data from the external table in Prod(step 4)? 2. In step 5, I meant for you to insert data into a new ACID table, which can be created using similar DDL as the original table. 3. After INSERT(step 5), see if you could read the data from the new table. Ideally it should work
... View more
07-24-2024
05:43 AM
Yes, 5 steps mentioned above
... View more
07-24-2024
12:15 AM
In Hive ACID tables, the base/delta files are associated with a writeID. My assumption is the write ID has changed or got removed in the production HMS, so hive does not recognize the data file. So, we will have to follow the workaround that I mentioned in my last update.
... View more
07-23-2024
11:49 PM
@AhXian Ideally it is not supported to copy files into and out of an ACID table location. In this case we can use a workaround: 1. Create an external table similar(same columns) as the original ACID table. 2. Upload the data file into the external table location. 3. MSCK REPAIR table <external table name> 4. See if you can read the data from the external table now. 5. If [4] works, insert the data into the managed table by using: insert into table managed_table_name partition(xx,yy) select * from external_table_name;
... View more
07-23-2024
05:25 AM
@AhXian the metadata is still not updated, as I see the following: ,numFiles ,0 ,numRows ,0 ,rawDataSize ,0 ,totalSize ,0 Let's try the following commands one after the other: MSCK REPAIR raw_zone_sit.jxtemp SYNC PARTITIONS; ANALYZE TABLE raw_zone_sit.jxtemp PARTITION (dl_created_yr='2023', dl_created_mth='12') COMPUTE STATISTICS; ANALYZE TABLE raw_zone_sit.jxtemp PARTITION (dl_created_yr='2023', dl_created_mth='12') COMPUTE STATISTICS for COLUMNS; See if the describe command still reflects 0 files. Also, could you collect the output of the following command from the backend metastore database? select * from NEXT_WRITE_ID where NWI_TABLE='jxtemp';
... View more
07-22-2024
08:45 AM
@AhXian These ORC tables we are talking about, are they ACID? If we copy data files to ACID table location, and run ANALYZE...COMPUTE STATISTICS, that alone might not resolve the issue. We would like for you to share an example. Please share the following command output: describe formatted table_name partition(dl_created_yr='2023', dl_created_mth='12') &HDFS command o/p hdfs dfs -ls -R <partition location from the above command>
... View more