Member since
09-05-2023
5
Posts
0
Kudos Received
0
Solutions
06-25-2024
12:58 PM
Despite extensive efforts, I was unable to directly resolve the issue, but I devised a workaround. Rather than directly accessing the Hadoop Job object for status updates, I extracted the job ID after submitting the job. Using this job ID, I created an ApplicationID object, which I then used to instantiate my YarnClient. This approach enabled me to effectively monitor the status and completion rate of the running job. Interestingly, both my distcp job and YarnClient are utilizing the same HadoopConf object (YarnClient is instantiated right after the DistCP Job is executed) and is within the same scope of the UserGroupInformation. The exact reason why the YarnClient can access the necessary information while the Job object cannot remains unclear. Nevertheless, this workaround has successfully unblocked me. Additional context: I am using Java 8 and running on an Ubuntu Xenial image.
... View more
03-05-2024
05:14 AM
@Kolli Based on the logs and the Spark-submit command provided, it seems like there are discrepancies between the authentication mechanisms used in the driver and the executor environments, leading to authentication errors. Here are some potential issues and solutions: Mismatch in Authentication Mechanisms: The driver seems to authenticate using Kerberos (kerberos), while the executor uses simple authentication (SIMPLE). Ensure consistency in the authentication mechanisms across the driver and executor environments. Kerberos Configuration: Verify that the Kerberos configuration (krb5.conf) provided in the spark.driver.extraJavaOptions and spark.executor.extraJavaOptions is correct and accessible by both the driver and executor. Check if the Kerberos principal and keytab specified in the spark-submit command are accurate and valid. SPNEGO Configuration: Ensure that SPNEGO authentication is properly configured for the Spark Elasticsearch connector. Verify that the SPNEGO principal (elasticsearch/hadoop.hadoop.com@HADOOP.COM) specified in the spark-submit command matches the one configured in the environment. Permission Issues: Check the permissions of the keytab file (user.keytab) specified in the spark-submit command to ensure that it is accessible by both the driver and executor. Token Renewal: Review the token renewal mechanism to ensure that tokens are properly renewed and propagated to the executor. To address the issue, consider the following steps: Ensure that both the driver and executor environments are configured consistently for Kerberos authentication. Double-check all Kerberos-related configurations, including the Kerberos principal, keytab, and krb5.conf file paths. Verify that the SPNEGO authentication settings are correctly configured for the Spark Elasticsearch connector. Check for any permission issues with the keytab file or other Kerberos-related files. Review the token renewal mechanism to ensure proper token propagation. Regards, Chethan YM
... View more