About mmk

ggangadharan · ‎10-27-2023

The error message "Unknown HS2 problem when communicating with Thrift server" typically indicates that there is an issue when trying to communicate with the Hive Server 2 (HS2) through its Thrift interface. This error can occur for various reasons, and troubleshooting it may require some investigation. Here are some common steps to help resolve this issue: Check Hive Server Status: Ensure that the Hive Server 2 is up and running and that it's reachable from your client. You can check its status and logs to see if there are any errors or issues reported. Network Connectivity: Verify that there are no network-related issues that might be preventing your client from connecting to the Hive Server. Check firewalls, network configurations, and any potential network interruptions. Hive Configuration: Review the Hive server's configuration to ensure it's correctly set up. Pay attention to security configurations, like authentication and authorization settings. Thrift Protocol Version: Ensure that the Thrift protocol version used by your client matches the version supported by the Hive Server. Mismatched protocol versions can lead to communication problems. Client-Side Issues: Check the client application or code you are using to interact with Hive. Ensure that it's properly configured and making the correct requests to the Hive Server. Logs and Error Messages: Examine the logs and error messages in more detail to get specific information about what might be causing the problem. This can help pinpoint the issue. Server Version Compatibility: Ensure that the client and server components (Hive client and Hive Server) are compatible in terms of versions. Incompatible versions can lead to communication issues. Authentication and Authorization: If your Hive server is configured with authentication and authorization, ensure that you have the necessary permissions and credentials to access it. Load and Resource Constraints: Check if the Hive Server is under heavy load or if there are resource constraints that might be affecting its ability to respond to client requests. Driver and Libraries: Ensure that you are using the correct driver or libraries for your client application. If you're using JDBC or ODBC, make sure the corresponding driver is installed and configured correctly. If you continue to face issues after performing these checks, it may be necessary to provide more specific error messages or details about your environment to diagnose the problem further.

RangaReddy · ‎08-31-2022

Hi @mmk I think you have shared the following information. 7 nodes with each having 250 gb memory and vcpu = 32 per each node spark-defaults.conf spark.executor.memory = 100g spark.executor.memoryOverhead = 49g spark.driver.memoryOverhead=200g spark.driver.memory = 500g You have maximum of 250 gb for node and you have specified driver memory is (500gb and 200gb overhead). How it possible to driver to get 700gb? Generally you should not exceed the driver/executor memory beyond yarn physical memory. Coming to the actual problem, please avoid the show() to print 8000000 records. If you need to get the print the all values, then implement a logic to 1000 records at once and next 1000 records for another iteration. https://stackoverflow.com/questions/29227949/how-to-implement-spark-sql-pagination-query

RangaReddy · ‎08-31-2022

Hi @mmk By default, Hive will load all SerDe under the hive/lib location. So you are able to do the create/insert/select operations. In order to read the Hive table created with Custom or external SerDe we need to provide to spark, so spark internally use those libraries and it will load the Hive table data. If you are not provided the serde you can see the following exception: org.apache.hadoop.hive.serde2.SerDeException Please add the following library to the spark-submit command: json-serde-<version>.jar

mmk · ‎05-11-2022

Hi @aakulov, we are using on-prem(bare-metal cluster ) Cloudera manager version 7.6.1 Cloudera Runtime 7.1.7 (Parcels) we configured the AWS credentials the same way as per the links shared by you, but getting unable to load AWS credentials with directory included in s3a Url (ex:"s3a://test/directory")

Online	Offline
Last Visited	‎06-21-2022 06:00 AM

Member Since	‎05-09-2022 01:57 AM
Last Visited	‎06-21-2022 06:00 AM
Posts	6

Cloudera Community

Re: Unable to connect to HIVE from CLI though the ...

Re: Unable to fetch more data like more than 7 mil...

Re: Unable to execute select queries from Pyspark ...

Re: Unable to create a hive table from the .csv fi...