About mszurap

mszurap · ‎06-09-2022

Hi @DataMan-HJ , the requirement you're looking for with case-insensitive joins doesn't seem to be present in Hive and likely will not be implemented as Hive relies on Java's UTF-8 strings and the behavior which implicitly comes with it - without possibility to change the collation. There's a good discussion on HIVE-4070 where a similar ask is raised for the LIKE operator behavior. You can review the pros and cons there. So you will likely need to go ahead to change the individual joins to use the lower/upper functions. Best regards Miklos

mszurap · ‎06-09-2022

Hi @luckes , thanks for reporting this. Based on your descriptinon yes, it seems the upsert is replaced everywhere to insert by the driver. Please open a support case through MyCloudera support portal to have this routed to the proper team for enhancement. Other ideas: - have you checked if this behavior can be observed with the latest JDBC driver version too? - please check if the "UseNativeQuery=1" helps in the JDBC connection string - does it work if you avoid the "insert" from the column ("insert_time") names, so for example with a "modification_time" column name? Thank you Miklos Szurap, Customer Operations Engineer, Cloudera

mszurap · ‎06-08-2022

Hi Andrea, Great to see that it has been found now and thanks for marking the post as answered. All the best, Miklos

mszurap · ‎06-08-2022

Hi @tallamohan The direct usage of the Hive classes (CliSessionState, SessionState, Driver) in the provided code falls under the "Hive CLI" or "Hcat CLI" access, which is not supported in CDP: https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/upgrade/topics/hive-unsupported.html Please open a case on MyCloudera Support Portal to get that clarified. The recommended approach would be to use beeline and access the Hive service through HiveServer2. Best regards Miklos

mszurap · ‎06-08-2022

Please remember that 1 block is not necessarily 256 MB, it can be less. Also not all files have replica factor of 3, some might have only 1 replica too, so it can be totally fine if all of those were all single replica files. 600.000 * 256 MB = 153.6 TB as a maximum, but since blocks can be smaller than 256 MB, the 60 TB freed up is reasonable.

mszurap · ‎06-08-2022

Please check which CDH version the cluster has. Cloudera ODBC Driver version 2.5.x in general is compatible with CDH 5.x, for CDH 6.x please use the latest 2.6.x version, please check our website / Downloads section. To further triage this: - Try to connect directly to a specific Impala coordinator host instead of the load-balancer - if load-balancer is used - Enable the driver side logging (check the driver's "Installation guide" how to enable it) which can give further clues. - Cross check that the Impala service is indeed SSL enabled, use different "openssl" commands to verify the certificate presented by the service, including the truststore used on the client side Hope this helps, Miklos Szurap, Customer Operations Engineer, Cloudera

mszurap · ‎06-03-2022

Hi @Amn_468 , The lock contention happens when there are too many "invalidate metadata" (IM) and "refresh" commands running. The catalog daemon's responsibility is to load the Hive Metastore metadata (hive table and partition information, including stats) and the HDFS metadata (list of files and their block locations). If a table is refreshed (or a table is loaded for the first time after an IM) then catalogd has to load these metadata information, and has some built-in limits and has a max throughput how many tables and/or partitions/files it can handle (load). While doing so it needs to maintain a lock on the "catalog update", to avoid simultaneous requests to overwrite the previously collected information. So if there are concurrent and long running "refresh" statements [1], then those can block each other and cause a delay in the publishing of the catalog information. What can be done is to: - reduce the number of IM calls - reduce the number of refresh calls - wherever it is possible, use refresh on partition level only - There were some improvements in IMPALA-6671, which is available in CDP 7.1.7 SP1 version, so an upgrade could also help (it still cannot completely help with high frequency, heavy refreshes) I hope this can help the discussions with the users/teams how frequently and when are they submitting the refresh queries. Miklos Customer Operations Engieer, Cloudera [1] https://impala.apache.org/docs/build3x/html/topics/impala_refresh.html

mszurap · ‎06-03-2022

That is great, thank you for sharing the solution! Best regards Miklos

mszurap · ‎06-01-2022

DN should keep files only which are still managed and known by NN. After a huge deletion event of course these "pending deletes" may take some time to be sent to DNs (and the DNs to delete them), but usually that's not that long. Maybe check the "select pending_deletion_blocks" chart if this is applicable. So if the above are not applicable, then check it more deeply with: - collect a full hdfs fsck -files -blocks -locations output - pick a DN which you think has more blocks than it should - verify how many blocks are reported by the hdfs fsck report for that DN - verify on DN side how many files is it storing - are those numbers matching?

mszurap · ‎05-31-2022

Hi Andrea, Oh, I see, I did not consider that you see this from the DataNodes' perspective. Was this cluster recently upgraded? Is the "Finalize upgrade" step for HDFS is still pending? https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/upgrade-cdp/topics/ug_cdh_upgrade_hdfs_finalize.html While HDFS upgrade is not finalized, DataNodes keep track of all the previous blocks (including blocks deleted after the upgrade) in case a "rollback" is needed.

Online	Offline
Last Visited	‎12-10-2024 10:10 AM

Member Since	‎11-04-2015 11:53 PM
Last Visited	‎12-10-2024 10:10 AM
Posts	260
Kudos received	44

Cloudera Community

Re: Hive fails to start with "Caused by: java.lang...

Re: The heap memory usage of NameNode is much high...

Re: Hue and Sqoop white spaces in query

Re: straight SELECT and SELECT via CTE produce dif...

Re: Best practices for partition tables in Impala ...

Re: Case-insensitive comparison Hive

Re: impala jdbc doesn't work for preparestatement ...

Re: HDFS block count does not decrease after delet...

Re: hive load data query is failing with following...

Re: HDFS block count does not decrease after delet...

Re: [Cloudera][ThriftExtension] (6) Error occurred...

Re: Impala Lock contention error while running Ref...

Re: NodeManager fails to start

Re: HDFS block count does not decrease after delet...

Re: HDFS block count does not decrease after delet...