Member since
07-29-2024
7
Posts
0
Kudos Received
0
Solutions
10-01-2024
01:37 PM
Hi @zegab Some directories inside the user folder in HDFS were deleted, which temporarily affected some services. Afterward, I reinstalled the clients and libraries for the services, and everything seems to be working fine now. Validation of basic read/write after manual failover: How exactly can I validate read/write operations on HDFS from the CLI after a manual failover? Ensuring that Oozie and Sqoop access both active and standby NameNodes correctly: I would like to validate how Oozie and Sqoop are accessing the active and standby NameNodes. I want to ensure they are properly configured to recognize the active NameNode and handle failover correctly. Log details and Sqoop configuration warning: Here’s part of the log I’m seeing: >>> Invoking Sqoop command line now >>> 17:05:55.428 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 17:05:55.479 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.7.7.1.7.2000-305 17:05:55.537 [main] WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error It seems that Sqoop is trying to connect to the standby NameNode instead of the active one, resulting in the StandbyException. Additionally, there’s a warning regarding the $SQOOP_CONF_DIR not being set. Could this indicate that Sqoop is missing some configuration settings and isn't recognizing the correct NameNode? Any further advice on validating the failover process, ensuring Oozie and Sqoop use the active NameNode, and resolving the $SQOOP_CONF_DIR issue would be greatly appreciated. Best regards, Leonardo
... View more
10-01-2024
12:57 PM
Hello Dear all, I am encountering an issue with an Oozie workflow that involves moving data from an external table to HDFS. The Oozie job successfully completes the process of bringing data from the external table to HDFS. After that, Sqoop should transfer this data into an ORC table. The issue occurs at a specific point in the Oozie workflow, where it tries to read two files from an HDFS directory: one containing the YARN application_id and the other with the instructions for database insertion. The job fails, and the directory remains empty. In the logs, I get the following error: org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Context: This error began after an incident where directories in HDFS were deleted. This may have affected the failover process and high availability (HA) of HDFS. The error seems to be related to the standby NameNode, suggesting that the client (Sqoop/Oozie) is trying to read/write to the standby NameNode, where read/write operations are not allowed. My Hypothesis: This problem may be related to the HDFS failover. The incident with the deleted directories may have affected HA, preventing Sqoop or Oozie from correctly communicating with the active NameNode, resulting in the failure to register files and the subsequent error. If the standby NameNode is not properly synchronized, or if failover is not functioning correctly, this could explain why the job is unable to write files to the HDFS directory, directly affecting the progress of the Oozie/Sqoop job. Questions: Could this issue be related to HDFS failover and the HA configuration being affected by the deleted directories? How can I validate whether the problem is related to HDFS HA and failover? Is there a way to force Sqoop/Oozie to properly use the active NameNode instead of the standby? I have checked the HA configuration, and failover seems to be functioning as the standby takes over when the active NameNode is restarted. However, the error persists when trying to read or write to HDFS. I appreciate any help and suggestions. Best regards
... View more
Labels:
08-19-2024
07:03 AM
Hi @ckumar , I couldn't find the command, would it be in the parcels path as shown in the attached printout?
... View more
08-12-2024
06:14 AM
In the development cluster, the NiFi nodes do not show data or connection to Cloudera Manager, but if I access them via WebUI, the nodes appear properly connected and functional. Attached are NiFi logs and screen prints as evidence Previously I was instructed to review the Ranger policies as described in the Cloudera documentation, but the problem persisted, attached are the current Ranger settings. I also found the following error curious: Cookie rejected [hadoop.auth=""u=nifi&p=nifi/v321d186.prevnet@DEVDATALAKE.CDPDTP&t=kerberos&e=1721689709429&s=qJPbU6hDNqLtkL1BokMp...", version:0, domain:v321d175.prevnet, path:/solr/ranger_audits_shard1_replica_n1, expiry:null] Domain attribute "v321d175.prevnet" violates the Netscape cookie specification However, in the prod cluster the same error appears in the logs and I don't have this problem
... View more
Labels:
- Labels:
-
Apache NiFi
08-09-2024
11:03 AM
@smruti I've validated it with the user, and here's what happens. The date field comes in string format and it only comes with 8 fields "Yyyy-mm-dd" and it was converting to time stamp which in this case aggregates the time fields, and that's where the divergence was. Is there anything in the tool that can be done to standardize this or do I need to change the data engineering?
... View more
08-08-2024
08:38 AM
Hello @smruti Here's a printout of a query in the data source table in string format, I'm going to provide a query in ORC format
... View more
08-02-2024
07:49 AM
We are experiencing an issue with converting data from string to timestamp in Hive, where the timestamps undergo unexpected hour changes during conversion and processing. We would like to verify if this could be a bug in the tool or if there are additional configurations required. Process and Steps to Reproduce the Issue: Loading Data into HDFS: 1. We use the hdfs put command to load a positional file into HDFS and store it in a directory accessible via an external table ex_sa_beneficios_mantidos where the data is in string format. 2. We create an external table in Hive to access the data as strings. 3. Converting String to Timestamp and Writing to ORC Table: When converting the string to timestamp, we notice an alteration in the hour of the dt_afast_trabalho field. 4. Verifying the Conversion and Data Reading: When reading the data from the ORC table, the hours in the dt_afast_trabalho field were different from expected. 5. Counting Timestamps by Hour: We performed a count by hour of the timestamps to verify the changes. Spark Configuration in Zeppelin: When initializing Hive in Zeppelin, we used the following parameters: %livy.spark from pyspark_llap import HiveWarehouseSession hive = HiveWarehouseSession.session(spark).build() spark.conf.set("spark.datasource.hive.warehouse.read.mode", value="SECURE_ACCESS") spark.conf.set("spark.datasource.hive.warehouse.load.staging.dir", value="hdfs://pcdpclusterdatalakennha:8020/dados/datalake/zona_temporaria/dadosprovisorios/hwc") spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNoemptyLocation", value="true") spark.conf.set("spark.sql.session.timeZone", "America/Sao_Paulo") Expected Behavior: We expect the hour values of the dt_afast_trabalho field to remain consistent throughout the conversion and processing stages, respecting the America/Sao_Paulo timezone. Current Behavior: We observe that the hour values of the dt_afast_trabalho field are being altered during the process of converting from string to timestamp and subsequent processing in Hive and Spark.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
-
Apache Zeppelin