Member since
11-04-2015
260
Posts
44
Kudos Received
33
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3139 | 05-16-2024 03:10 AM | |
1689 | 01-17-2024 01:07 AM | |
1706 | 12-11-2023 02:10 AM | |
2425 | 10-11-2023 08:42 AM | |
1701 | 09-07-2023 01:08 AM |
06-09-2022
03:35 AM
1 Kudo
Hi @DataMan-HJ , the requirement you're looking for with case-insensitive joins doesn't seem to be present in Hive and likely will not be implemented as Hive relies on Java's UTF-8 strings and the behavior which implicitly comes with it - without possibility to change the collation. There's a good discussion on HIVE-4070 where a similar ask is raised for the LIKE operator behavior. You can review the pros and cons there. So you will likely need to go ahead to change the individual joins to use the lower/upper functions. Best regards Miklos
... View more
06-09-2022
03:25 AM
Hi @luckes , thanks for reporting this. Based on your descriptinon yes, it seems the upsert is replaced everywhere to insert by the driver. Please open a support case through MyCloudera support portal to have this routed to the proper team for enhancement. Other ideas: - have you checked if this behavior can be observed with the latest JDBC driver version too? - please check if the "UseNativeQuery=1" helps in the JDBC connection string - does it work if you avoid the "insert" from the column ("insert_time") names, so for example with a "modification_time" column name? Thank you Miklos Szurap, Customer Operations Engineer, Cloudera
... View more
06-08-2022
04:08 AM
1 Kudo
Hi Andrea, Great to see that it has been found now and thanks for marking the post as answered. All the best, Miklos
... View more
06-08-2022
01:10 AM
Hi @tallamohan The direct usage of the Hive classes (CliSessionState, SessionState, Driver) in the provided code falls under the "Hive CLI" or "Hcat CLI" access, which is not supported in CDP: https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/upgrade/topics/hive-unsupported.html Please open a case on MyCloudera Support Portal to get that clarified. The recommended approach would be to use beeline and access the Hive service through HiveServer2. Best regards Miklos
... View more
06-08-2022
01:04 AM
1 Kudo
Please remember that 1 block is not necessarily 256 MB, it can be less. Also not all files have replica factor of 3, some might have only 1 replica too, so it can be totally fine if all of those were all single replica files. 600.000 * 256 MB = 153.6 TB as a maximum, but since blocks can be smaller than 256 MB, the 60 TB freed up is reasonable.
... View more
06-08-2022
12:44 AM
Please check which CDH version the cluster has. Cloudera ODBC Driver version 2.5.x in general is compatible with CDH 5.x, for CDH 6.x please use the latest 2.6.x version, please check our website / Downloads section. To further triage this: - Try to connect directly to a specific Impala coordinator host instead of the load-balancer - if load-balancer is used - Enable the driver side logging (check the driver's "Installation guide" how to enable it) which can give further clues. - Cross check that the Impala service is indeed SSL enabled, use different "openssl" commands to verify the certificate presented by the service, including the truststore used on the client side Hope this helps, Miklos Szurap, Customer Operations Engineer, Cloudera
... View more
06-03-2022
02:39 AM
2 Kudos
Hi @Amn_468 , The lock contention happens when there are too many "invalidate metadata" (IM) and "refresh" commands running. The catalog daemon's responsibility is to load the Hive Metastore metadata (hive table and partition information, including stats) and the HDFS metadata (list of files and their block locations). If a table is refreshed (or a table is loaded for the first time after an IM) then catalogd has to load these metadata information, and has some built-in limits and has a max throughput how many tables and/or partitions/files it can handle (load). While doing so it needs to maintain a lock on the "catalog update", to avoid simultaneous requests to overwrite the previously collected information. So if there are concurrent and long running "refresh" statements [1], then those can block each other and cause a delay in the publishing of the catalog information. What can be done is to: - reduce the number of IM calls - reduce the number of refresh calls - wherever it is possible, use refresh on partition level only - There were some improvements in IMPALA-6671, which is available in CDP 7.1.7 SP1 version, so an upgrade could also help (it still cannot completely help with high frequency, heavy refreshes) I hope this can help the discussions with the users/teams how frequently and when are they submitting the refresh queries. Miklos Customer Operations Engieer, Cloudera [1] https://impala.apache.org/docs/build3x/html/topics/impala_refresh.html
... View more
06-03-2022
12:48 AM
That is great, thank you for sharing the solution! Best regards Miklos
... View more
06-01-2022
03:21 AM
DN should keep files only which are still managed and known by NN. After a huge deletion event of course these "pending deletes" may take some time to be sent to DNs (and the DNs to delete them), but usually that's not that long. Maybe check the "select pending_deletion_blocks" chart if this is applicable. So if the above are not applicable, then check it more deeply with: - collect a full hdfs fsck -files -blocks -locations output - pick a DN which you think has more blocks than it should - verify how many blocks are reported by the hdfs fsck report for that DN - verify on DN side how many files is it storing - are those numbers matching?
... View more
05-31-2022
07:33 AM
Hi Andrea, Oh, I see, I did not consider that you see this from the DataNodes' perspective. Was this cluster recently upgraded? Is the "Finalize upgrade" step for HDFS is still pending? https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/upgrade-cdp/topics/ug_cdh_upgrade_hdfs_finalize.html While HDFS upgrade is not finalized, DataNodes keep track of all the previous blocks (including blocks deleted after the upgrade) in case a "rollback" is needed.
... View more