About mszurap

mszurap · ‎05-21-2024

Thanks @ldylag for sharing the details, that makes sense. The CM does not necessarily need the PostgreSQL driver to be installed, of course it's needed only when that database type is configured for CM. The one under the "/opt/cloudera/cm/lib" might be shipped to support the "embedded" database, that might have caused a confusion here (if one HMS is on the CM host), but in general the DB drivers are expected to be under the "/usr/share/java". Cheers Miklos

mszurap · ‎05-16-2024

Hello @ldylag, The stacktrace shows that the "org.postgresql.core.v3.ConnectionFactoryImpl" class depends on the "com/ongres/scram/common/stringprep/StringPreparation.class" - and that is not available on the classpath. By default HMS searches for the JDBC driver under the /usr/share/java. Please compare if the same JDBC driver is available on both HMS hosts. Probably the best would be to use the latest drivers from: https://jdbc.postgresql.org/download/ Best regards Miklos

mszurap · ‎02-29-2024

Hi @dqsdqs , Please also see the following article: https://community.cloudera.com/t5/Customer/Troubleshooting-Kerberos-Related-Issues-Common-Errors-and/ta-p/76192 Most of the times the "Server xxx not found in Kerberos database" message indicates that you need to include the server hostname in the "[domain_realm]" (host to realm mapping) section, so that the kerberos client can go to the proper KDC. Cheers Miklos

mszurap · ‎01-18-2024

As far as I know this is more of an empirical best practice. As mentioned it cannot be exactly calculated since there are some variable factors (filename / path lengths, acl counts, etc) which change from environment to environment.

mszurap · ‎01-17-2024

Hi @Meepoljd , The file and block metadata consumes the NameNode heap. Can you share how did your calculation? Per our docs: https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/hdfs-overview/topics/hdfs-sizing-namenode-heap-memory.html the file count should kept below 300m files. Also the same page suggests that approximately 150 bytes are needed for each namespace object, I assume you did your calculation based on that. The real NN heap consumption varies with the path lengths, ACL counts, replication factors, snapshots, operational load, etc. As such in our other page https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/hdfs-overview/topics/hdfs-examples-namenode-heap-memory.html we suggest to allocate rather a bigger heap size, 1 GB heap for 1 million blocks, which would be ~320 GB in your case. Hope this helps, Best regards, Miklos

mszurap · ‎01-13-2024

Hi @mpla217 , can you share exactly what query are you executing? Per the error message the query is syntactically incorrect around the "TIMESTAMPADD" part. If that TIMESTAMPADD function is not used, then maybe the Hive JDBC driver is optimizing the query (converting it) to use that function instead of plus signs (just guessing). As per the output, you are using Cloudera Hive JDBC driver. Make sure you are using the latest version https://www.cloudera.com/downloads/connectors/hive/jdbc also you may try to use the "UseNativeQuery=1" connection url property to disable the optimizations done by the driver. (which are sometimes incorrect) Thanks Miklos

mszurap · ‎12-11-2023

Hi @Sokka Thank you for raising this. I see the same behavior in the "Sqoop 1" editor, though with some older CDH versions, I haven't tested with the most recent ones, maybe it's already fixed. In any case, I would not advise to use the "Sqoop 1" editor, it will quickly become insufficient, as it does not provide any advanced configurations. Instead please create an Oozie workflow (Scheduler / Workflow) with a Sqoop action, you have better controls over there. As I see this problem should have been fixed in later versions, see https://issues.cloudera.org/browse/HUE-6717 Best regards Miklos

mszurap · ‎11-08-2023

Hi @HadoopHero , For Hive, if there is a single reduce task to write the output data it will not break it up the output file into smaller files, that's expected and cannot be configured to behave in a different way. With DISTRIBUTE BY you should be able to achieve to have multiple reducers (if you have a column by which you can "split" your data reasonably into smaller subsets), see https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy Best regards Miklos

mszurap · ‎11-08-2023

To add to the point of @ggangadharan, there are lots of good articles/posts why the float and even the double datatype has these problems. Note that this is not Hive / Hadoop or Java specific. https://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency https://dzone.com/articles/never-use-float-and-double-for-monetary-calculatio https://www.red-gate.com/hub/product-learning/sql-prompt/the-dangers-of-using-float-or-real-datatypes Miklos

mszurap · ‎10-30-2023

Hi @cl99 , yes, seems there is a 50 MB limit for the max rpc message size in the CDH 6.3.2 version. https://github.com/apache/impala/blob/branch-3.2.0/be/src/kudu/rpc/transfer.cc#L39 This error is likely the result of the unsafe flag you have turned on. Best regards Miklos

Online	Offline
Last Visited	‎01-22-2026 07:16 AM

Member Since	‎11-04-2015 11:53 PM
Last Visited	‎01-22-2026 07:16 AM
Posts	261
Kudos received	44

Cloudera Community

Re: Hive fails to start with "Caused by: java.lang...

Re: The heap memory usage of NameNode is much high...

Re: Hue and Sqoop white spaces in query

Re: straight SELECT and SELECT via CTE produce dif...

Re: Best practices for partition tables in Impala ...

Re: Hive fails to start with "Caused by: java.lang...

Re: Hive fails to start with "Caused by: java.lang...

Re: Connection to Hive & Impala - Kerberos Authent...

Re: The heap memory usage of NameNode is much high...

Re: The heap memory usage of NameNode is much high...

Re: Refresh Extract Error

Re: Hue and Sqoop white spaces in query

Re: Possibility Split Parquet file

Re: Hive: cast String to Float alters decimal par...

Re: Network error: RPC frame had a length of 56685...