About ChethanYM

ChethanYM · ‎03-19-2024

Hi @MrBeasr Review the oozie logs for this workflow if there is anything suspicious and you can paste here. oozie job -oozie http://<oozie-server-host>:11000 -log <workflow-id> Regards, Chethan YM

ChethanYM · ‎03-18-2024

@Ynr In the below doc refer the "Configuring Logging Options on Windows" section to understand the steps to enable trace/debug level logging. https://docs.cloudera.com/documentation/other/connectors/impala-odbc/2-6-11/Cloudera-ODBC-Driver-for-Impala-Install-Guide.pdf Regards, Chethan YM

ChethanYM · ‎03-18-2024

Hi @Ynr Can you try the latest ODBC driver version? The error message 1 and 2 both are different and occurs for different scenarios, The error 1 shows some connection issue via driver to Impala, Do enable the trace level logging for driver to get more logs and also review the power bi logs since the issue is while refreshing in the power BI. Refer the similar article: https://community.cloudera.com/t5/Support-Questions/SAS-Impala-connection-error-Impala-Thrift-API/m-p/334615 The error 2 shows memory issue after you submitting the query, You may need to tune the impala admission control pool to avoid this error and have the enough memory in the pool as per your requirements. https://docs.cloudera.com/runtime/7.2.0/impala-manage/topics/impala-admission.html Regards, Chethan YM

ChethanYM · ‎03-05-2024

@Kolli Based on the logs and the Spark-submit command provided, it seems like there are discrepancies between the authentication mechanisms used in the driver and the executor environments, leading to authentication errors. Here are some potential issues and solutions: Mismatch in Authentication Mechanisms: The driver seems to authenticate using Kerberos (kerberos), while the executor uses simple authentication (SIMPLE). Ensure consistency in the authentication mechanisms across the driver and executor environments. Kerberos Configuration: Verify that the Kerberos configuration (krb5.conf) provided in the spark.driver.extraJavaOptions and spark.executor.extraJavaOptions is correct and accessible by both the driver and executor. Check if the Kerberos principal and keytab specified in the spark-submit command are accurate and valid. SPNEGO Configuration: Ensure that SPNEGO authentication is properly configured for the Spark Elasticsearch connector. Verify that the SPNEGO principal (elasticsearch/hadoop.hadoop.com@HADOOP.COM) specified in the spark-submit command matches the one configured in the environment. Permission Issues: Check the permissions of the keytab file (user.keytab) specified in the spark-submit command to ensure that it is accessible by both the driver and executor. Token Renewal: Review the token renewal mechanism to ensure that tokens are properly renewed and propagated to the executor. To address the issue, consider the following steps: Ensure that both the driver and executor environments are configured consistently for Kerberos authentication. Double-check all Kerberos-related configurations, including the Kerberos principal, keytab, and krb5.conf file paths. Verify that the SPNEGO authentication settings are correctly configured for the Spark Elasticsearch connector. Check for any permission issues with the keytab file or other Kerberos-related files. Review the token renewal mechanism to ensure proper token propagation. Regards, Chethan YM

ChethanYM · ‎03-04-2024

@Muskan can you set this parameter before running the query and see if this helps? set PARQUET_FALLBACK_SCHEMA_RESOLUTION=1; Regards, Chethan YM

ChethanYM · ‎03-04-2024

@liorh The error message you're encountering, "[HY000] [Cloudera][ThriftExtension] (11) Error occurred while contacting server: EAGAIN (timed out)," typically indicates a timeout issue while attempting to establish a connection with the Impala server using the Thrift protocol. Below is just generic troubleshooting tips, You need to analyse and troubleshoot your environment. Causes of the Error: Network Latency: The error may occur due to network latency or connectivity issues between your application and the Impala server. This can lead to timeouts during the connection attempt. Server Load: If the Impala server is under heavy load or experiencing resource constraints, it may not be able to handle incoming connection requests promptly, resulting in timeouts. Thrift Protocol Issues: The error message mentions using a binary mechanism for authentication. If there are inconsistencies or misconfigurations in the Thrift protocol settings between your application and the Impala server, it could lead to connection failures. Dealing with the Error: Retry Mechanism: As you mentioned, implementing a retry mechanism in your software to make multiple attempts to run the action is a good approach. This can help mitigate transient network issues or server load spikes that may cause the initial connection attempt to fail. Preventing the Problem: Optimize Network Configuration: Review and optimize the network configuration between your application and the Impala server to minimize latency and improve reliability. This may include configuring network settings, optimizing routing, or using dedicated network connections. Server Performance Tuning: Monitor the performance of the Impala server and address any resource bottlenecks or performance issues that could lead to connection timeouts. This may involve optimizing server configuration, increasing resources, or tuning Impala parameters. Thrift Protocol Configuration: Ensure that the Thrift protocol settings, including authentication mechanisms, are correctly configured and consistent between your application and the Impala server. Review the trace level driver logs, Impalad, statestore and catalog logs at the time of issue to see if we can get some details of the issue. Regards, Chethan YM

ChethanYM · ‎03-04-2024

@krishna2023 It seems like you're encountering issues with loading data into partitions in Impala after executing the provided steps. Create table as select * from db2.t1 where 1=2: This step creates an empty table db1.t1 based on the schema of db2.t1 without any data. Ensure that the table schema matches between db1.t1 and db2.t1. Alter table set location: After creating the empty table, you're altering its location to a new path. Make sure that the specified path exists and has the necessary permissions for Impala to read and write data. Add partition for every day: Adding partitions should involve specifying the loading date for each partition and its corresponding HDFS directory path. Double-check that the HDFS directory paths specified in each partition definition are correct and accessible by Impala. Refresh table: The REFRESH command updates the metadata of the table to reflect changes made in the underlying data directory. After adding partitions, running REFRESH is necessary to inform Impala about the new partitions. Make sure to execute this command after adding partitions. Compute stats: The COMPUTE STATS command gathers statistics about the table, which helps Impala optimize query execution. While this command is not directly related to loading data into partitions, it's good practice to run it after making significant changes to the table. To further troubleshoot the issue, consider the following additional steps: Check Impala logs for any error messages or warnings that might indicate issues with loading data or adding partitions. Verify that the data files corresponding to the partitions are present in the specified HDFS directory paths. Ensure that the partitioning column (loading_date) values in the data files match the partition definitions specified in the ALTER TABLE statements. Regards, Chethan YM

ChethanYM · ‎03-04-2024

@RobertusAgung The error message "Unexpected response received from server" suggests that there might be a problem with the communication between the Windows Server and the Impala server. The additional error "GetUsernameEx(NameUserPrincipal) failed: 1332" seems to indicate a problem with retrieving the username. Here are some steps you can take to troubleshoot and resolve the issue: Check Firewall Settings: Ensure that the firewall settings on the Windows Server machine allow outgoing connections to the Impala server's IP address and port. You've mentioned that you can telnet to the Impala IP and port, but double-check that there are no additional restrictions. Verify Impala Server Configuration: Make sure that the Impala server is properly configured to accept connections from the Windows Server machine. Check the Impala server logs for any errors or warnings that might indicate issues with incoming connections. ODBC Configuration: Double-check the ODBC configuration on the Windows Server machine to ensure that the connection settings (server host, port, authentication mechanism, etc.) are correct and match those on your local laptop where the connection works. Certificate Installation: If your Impala server is configured to use SSL/TLS encryption, ensure that the necessary SSL/TLS certificates are installed on the Windows Server machine. You may need to import the certificates into the Windows certificate store. User Permissions: Ensure that the user account running the ODBC connection on the Windows Server machine has the necessary permissions to access the Impala server. This includes both network permissions and database permissions. Regards, Chethan YM

ChethanYM · ‎03-04-2024

@vhp1360 Given the behavior you've observed with different batch sizes and column counts, it's possible that there is a memory or resource constraint causing the error when dealing with a large number of columns and rows. Here are some potential causes and troubleshooting steps to consider: Memory Constraints: Loading a dataset with 200 columns and 20 million rows can require a significant amount of memory, especially if each column contains large amounts of data. Ensure that the system running IBM DataStage has sufficient memory allocated to handle the processing requirements. Configuration Limits: Check if there are any configuration limits or restrictions in the IBM DataStage or Hive connector settings that might be causing the issue. For example, there could be a maximum allowed stack size or buffer size that is being exceeded when processing large datasets. Resource Utilization: Monitor the resource utilization (CPU, memory, disk I/O) on the system running IBM DataStage during the data loading process. High resource utilization or contention could indicate a bottleneck that is causing the error. Optimization Techniques: Consider optimizing the data loading process by adjusting parameters such as batch size, record count, or buffer size. Experiment with different configurations to find the optimal settings that can handle the larger dataset without encountering errors. Data Format Issues: Verify that the data format and schema of the dataset are consistent and compatible with the Hive table schema. Data inconsistencies or mismatches could potentially cause errors during the loading process. Regards, Chethan YM

ChethanYM · ‎03-04-2024

@muneeralnajdi The issue you're encountering with the Hive external table, where it fails when using COUNT(*) or WHERE clauses, seems to be related to the custom input format not being properly utilized during query execution. This can lead to errors when Hive attempts to read the files using the default input format. Ensure Custom Input Format is Used: Verify that the custom input format (CustomAvroContainerInputFormat) is correctly configured and loaded in the Hive environment. Confirm that the JAR containing the custom input format class is added to the Hive session or cluster, and that there are no errors or warnings during the JAR loading process. Check Table Properties: Ensure that the custom input format class is correctly specified in the table properties (INPUTFORMAT), and that there are no typos or syntax errors in the table definition. Test with Basic Queries: Start with basic queries (SELECT *) to ensure that the custom input format is properly utilized and data can be read from the Avro files(I think it is working). If basic queries work fine but more complex queries fail, it may indicate issues with the input format's compatibility with certain Hive operations. Consider Alternative Approaches: If troubleshooting the custom input format does not resolve the issue, consider alternative approaches for filtering the files based on their format. For example, you could pre-process the data to separate Avro and JSON files into different directories or partitions, or use other techniques such as external scripts or custom SerDes to handle different file formats within the same directory. Regards, Chethan YM

Online	Offline
Last Visited	‎12-19-2024 07:16 AM

Member Since	‎03-06-2020 05:48 AM
Last Visited	‎12-19-2024 07:16 AM
Posts	398
Kudos received	53

Cloudera Community

Re: Sqoop export fails from hive to oracle when co...

Re: Multi Node Hadoop Cluster setup with Hbase and...

Re: Hive connecting to node that does not exist

Re: Error creating login context using ticket cach...

Re: Incompatible parquet Schema on Impala but quer...

Re: Oozie stuck in RUNNING state

Re: Dataset refresh failure for Impala odbc data s...

Re: Dataset refresh failure for Impala odbc data s...

Re: Kerberos Issue

Re: Incompatible parquet Schema on Impala but quer...

Re: Receiving Error "EAGAIN (timed out)"

Re: Load hdfs table partition file to another dat...

Re: Can't Connect to Impala from ODBC with SSL

Re: insert data into Hive from IBM DataStage

Re: Customized class INPUTFORMAT for hive external...